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This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N .gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
1 0 present in all pathogenic meningococci. 

N. gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 
1 5 Vaccination against N. gonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 N. meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al. (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. NEnglJMed 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of N. meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 
5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

10 immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H. influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked /V-acetyl neuraminic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the iV-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Borriello (1996) The meningococcal transferrin-binding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
10 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae. 

THE INVENTION 

15 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N. meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
functional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
affine gap search with parameters gap open penalty^ 12 and gap extension penalty=l. 

25 The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

5 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
1 0 sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
15 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 



In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
10 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of N. meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and 

10 ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & S.J. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 

15 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer- Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The term "heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
5 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

10 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

15 cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 

20 Smith- Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 

25 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
30 for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
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i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
5 transcription initiating region, which is usually placed proximal to the 5' end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 
10 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
15 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 
tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J. 4:161] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 
[Gorman et al. (1982b) Proc. Natl. Acad. Sci. 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 41:521]. Additionally, some enhancers are regulatable and become active only 



25 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet. 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

15 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 41:349; 
Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
Sci. 14:\QS\. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." la. Molecular Cloning: A Laboratory Manual]. 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element {eg. plasmids) capable of stable maintenance in a host, such as 

30 mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
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viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 23:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammalian replicons include those derived from bovine 
5 papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufman et al. (1989) Mol. Cell. Biol. 9:946] andpHEBO [Shimizu et al. 
(1986) Mol. Cell. Biol. 6:1074]. 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. 

15 Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells {eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculo virus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to the control elements within that vector. Vector construction employs 
techniques which are known in the art. Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
5 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
1 0 elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element {eg. plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing 
it to be maintained in a suitable host for cloning and amplification. 

1 5 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

20 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:111) and a prokaryotic ampicillin-resistance {amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 

25 (5' to 3') transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 

30 regulated or constitutive. 
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Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
5 476; and the gene encoding the p 10 protein, Vlak et al., (1988), J. Gen. Virol. 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 

10 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human oc-interferon, Maeda et al., (1985), Nature 315:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell. Biol. 5:3129; 
human IL-2, Smith et al., (1985) Proc. Nat'lAcad. Sci. USA, 52:8404; mouse IL-3, (Miyajima et 

15 al., (1987) Gene 58:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus — usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol. Cell. Biol. (1983) 3:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays ¥:91.The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

1 5 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 um in size, are highly retractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 

20 wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol. 55:153; Wright (1986) Nature 
327:718; Smith et al., (1983) Mol. Cell. Biol. 3:2156; and see generally, Fraser, et al. (1989) In 

30 Vitro Cell. Dev. Biol. 25:225). 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, while removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, the 
product may be further purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or result from lysis of insect cells, so as to provide a product which 
is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 
iii. Plant Systems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
. US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et al., Mol. Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol. Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 

30 gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
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References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl. Acad. Sci. 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 

1 0 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 

15 general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol. Biol. Reptr, 11 (2): 165-1 85. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 
5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 
10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as inrrons by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
1 5 region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed andManiatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl. Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solarium, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
10 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 
5 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 

10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 

15 coli) [Raibaud et al. (1984) Annu. Rev. Genet. 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose (lac) [Chang et al. (1977) Nature 795:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al. 
(1980) Nuc. Acids Res. 5:4057; Yelverton et al. (1981) Nucl. Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bid) promoter system 
[Weissmann (1981) "The cloning of interferon and other mistakes." In Interferon 3 (ed. I. Gresser)], 
bacteriophage lambda PL [Shimatake et al. (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al. (1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Set 50:21], 
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Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. 
Mol. Biol. 759:113; Tabor et al. (1985) Proc Natl. Acad. Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0 267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-11 nucleotides upstream of the initiation codon [Shine et al. (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al. (1979) 
15 "Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
30 [Nagai et al. (1984) Nature 309:810]. Fusion proteins can also be made with sequences from the 
lacL [Jia et al. (1987) Gene 60:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al. 
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(1989) J. Gen. Microbiol J 35:11], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme (eg. ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al. (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
1 0 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

1 5 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al. (1984) EMBO J. 3:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al. (1985) Proc. Natl. Acad. Sci. 52:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from B. subtilis [Palva et al. (1982) Proc. Natl. Acad. 
Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 [Davies et al. (1978) Annu. Rev. Microbiol. 52:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl. Acad. Sci. USA 79:5582; EP-A-0 036 259 andEP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al. (1981) Nature 292:128; Amann et al. (1985) Gene 40:183; Studier et al. 

30 (1986) J. Mol. Biol. 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al. (1988) Appl. Environ. Microbiol. 54:655]; Streptococcus 
lividans [Powell et al. (1988) Appl. Environ. Microbiol. 54:655], Streptomyces lividans [US patent 
4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
5 include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson et al. (1989) FEMS Microbiol. Lett. 60:273; Palva et al. (1982) Proc. Natl. Acad. Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al. (1988) 

1 0 Proc. Natl. Acad. Sci. 85: 856; Wang et al. (1 990) J. Bacteriol. 1 72:949, Campylobacter], [Cohen 
et al. (1973) Proc. Natl. Acad. Sci. 69:2110; Dower et al. (1988) Nucleic Acids Res. 16:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl-derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al. (1970) J. Mol. Biol. 53:159; Taketo 

15 (1988) Biochim. Biophys. Acta 949:31%; Escherichia], [Chassy et al. (1987) FEMS Microbiol. Lett. 
44:173 Lactobacillus]; [Fiedler et al. (1988) Anal. Biochem 170:3%, Pseudomonas]; [Augustin et 
al. (1990) FEMS Microbiol. Lett. 66:203, Staphylococcus], [Barany et al. (1980) /. Bacteriol. 
144:69%; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al. (1981) Infect. Immun. 

20 32: 1295; Powell et al. (1988) Appl. Environ. Microbiol. 54:655; Somkuti et al. (1987) Proc. 4th 
Evr. Cong. Biotechnology 1:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 

25 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PHOS gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al. (1983) Proc. Natl. Acad. Set USA 50:1]. 

1 0 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

15 which consist of the regulatory sequences of either the ADH2, GAL4, GAL10, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 

20 77:1078; Henikoff et al. (1981) Nature 283:835; Hollenberg et al. (1981) Curr. Topics Microbiol. 
Immunol. 9(5:119; Hollenberg et al. (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 
11:163; Panthier et al. (1980) Curr. Genet. 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terrninus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5' terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 
10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
1 5 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 
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Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
5 termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

1 0 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al. (1979) Gene 5:17-24], pCl/1 [Brake et al. 
(1984) Proc. Natl. Acad. Sci USA 81 :4642-4646], and YRpl7 [Stinchcomb et al. (1982) J. Mol. 
Biol. 755:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 1 0 to about 1 50. A host containing a high copy number plasmid will preferably have 
at least about 1 0, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al., supra. 

20 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al. (1983) Methods in 

25 Enzymol. 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al. (1983) Proc. Natl. Acad. Sci. USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
5 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al. (1987) Microbiol, 
10 Rev. 51-351}. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
15 have been developed for transformation into many yeasts. For example, expression vectors have 

been developed for, inter alia, the following yeasts: Candida albicans [Kurtz, et al. (1986) Mol. 

Cell. Biol. (5:142], Candida maltosa [Kunze, et al. (1985) J. Basic Microbiol. 25:141]. Hansenula 

polymorpha [Gleeson, et al. (1986) J. Gen. Microbiol. 732:3459; Roggenkamp et al. (1986) Mol. 

Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al. (1984) /. Bacteriol. 755:1165], 
20 Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacteriol. 154:737; Van den Berg et al. 

(1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al. (1985) J. Basic Microbiol. 

25:141], Pichia pastoris [Cregg, et al. (1985) Mol. Cell. Biol. 5:3376; US Patent Nos. 4,837,148 

and 4,929,555], Saccharomyces cerevisiae [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 

75:1929; Ito et al. (1983) J. Bacteriol. 753:163], Schizosaccharomyces pombe [Beach and Nurse 
25 (198 1) Nature 300:706], and Yarrowia lipolytica [Davidow, et al. (1985) Curr. Genet. 70:380471 

Gaillardin, et al. (1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
30 et al. (1986) Mol. Cell. Biol. 5:142; Kunze et al. (1985) J. Basic Microbiol. 25:141; Candida]; 
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[Gleeson et al. (1986) J. Gen. Microbiol. 132:3459; Roggenkamp et al. (1986) Mol. Gen. Genet. 
202:302; Hansenula]; [Das et al. (1984) J. Bacteriol. 158:1165; De Louvencourt et al. (1983) J. 
Bacteriol. 154:1165; Van den Berg et al. (1 990) Bio/Technology 5:135; Kluyveromyces] ; [Cregg 
et al. (1985) Mol. Cell. Biol. 5:3376; Kunze et al. (1985) J. Basic Microbiol. 25:141; US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 75;\929; 
Ito et al. (1983) J. Bacteriol. 153:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al. (1985) Curr. Genet. 10:39; Gaillardin et al. (1985) Curr. 
Genet. 10:49; Yarrowia]. 

Antibodies 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 (ig/injection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
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recovered by centrifugation (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
5 above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspeciflcally adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 

10 the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 

15 cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 
and l25 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 

20 are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3',5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 

25 and the numerous receptor-ligand couples known in the art. It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, 125 I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect. For example, MAbs and avidin also require labels in the practice of 

30 this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

10 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 

1 5 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub. Co., N. J. 1991). 
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Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

1 5 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, duminum sulfate, etc; 
(2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 



25 
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such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
5 formulated into submicron particles using a microfluidizer such as Model 1 10Y microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins (eg. 

15 IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons {eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(r-2'-dipalmitoyl-5«-glycero-3- 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutically acceptable carriers. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 
5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 
10 can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
1 5 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non- viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 

25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
30 also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picomavirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol. 53:160) polytropic retroviruses 
eg. MCF and MCF-MLV (see Kelly (1983) J. Virol. 45:291), spumaviruses and lentiviruses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells {eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1 504A (Hartley and Rowe (1976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
10 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81 :6349; and Miller (1990) 
Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat (ie. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J! Virol. 61:3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
5 Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
MuzyczkaUS Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
10 Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
15 exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

20 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 

25 WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 

30 08/679640). 
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DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

5 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol. 
Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:3 17; 

10 Flexner (1989) Arm NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employing reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 

15 Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichael 
(1983) NEJMed 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) J. Virol. 66:2731; 
measles virus, for example ATCC VR-67 and VR-1247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1240; 

20 Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1 241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1 243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu 
virus, for example ATCC VR-37 1; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; 

25 Tonate virus, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example 
ATCC VR-375; CWyong virus, Eastern encephalitis virus, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis virus, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1252; and coronavirus, for example ATCC VR-740 and those described in Hamre 

30 (1966) Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
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expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
5 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in W092/1 1033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:2411-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

10 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol. Chem. 

15 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
beads. The method may be improved further by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 
promoters. Further non- viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl. Acad. Sci. USA 
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91(24): 1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600: 1 ; Bayer (1 979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
149:1 19; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

20 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 
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Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

1 5 other invasive organisms, such as the 1 7 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 

B. Hormones, Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

20 C.Polyalkylenes, Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

25 D. Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
3 0 but will generally be around 1 : 1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Mert. Enzymol. 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci. USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990) J. Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
10 Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 

Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 
ratios. Methods for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101:512-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. Sci. USA 76:145; Fraley (1980) J. Biol. 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl. Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
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E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman numerals, AI, All, 
ATV; CI, CII, CIII. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
15 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
261 : 1291 8; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Meth. Enzymol. {supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Techniologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al. PCT/US97/14465. 
F.Polvcationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyonu'thine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiagnostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
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samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
bonding. Typically, one sequence will be fixed to a solid support and the other will be free in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the liquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; 
use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al. 
[supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al. at page 9.50. 

30 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
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total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to lug for a 
plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
5 can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 ug of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/n.g. For a single-copy mammalian gene a conservative approach would start 
with 10 ug of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10 8 cpm/ug, resulting in an exposure time of -24 hours. 

1 0 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and forrnamide content of the hybridization buffer. The effects of all of these 

1 5 factors can be approximated by a single equation: 

Tm= 81 + 16.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal. Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
20 conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
25 interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% forrnamide are 42°C for 
30 a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
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and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high background are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
15 encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al. 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al. [Proc. Natl. Acad. Sci. USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
1 0 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al. (1993) TIBTECH 11:384-386]. 

15 Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al. [Meth. Enzymol. 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the amplification target (or its complement) to aid with 

20 duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southern 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 
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to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main N. meningitidis immunoreactive band. TP indicates 
N. meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 

10 shows GST control data; a circle (•) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al. (1989) J. Immunol. 143:3007; Roberts et al. (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al. (1992) Scand J Immunol suppl.l 1:9) and is available in the Protean package of DNASTAR, Inc. 

15 (1228 South Park Street, Madison, Wisconsin 53715 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N. meningitidis, along 
with their putative translation products, and also those of N. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
N. gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 



20 



25 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
5 sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et at. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
10 following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

1 5 Dots within nucleotide sequences (eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters (eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al. [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

15 After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50ug/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

20 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRL-Nhel, depending on the gene's own restriction pattern); the 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either BamHl-Xhol or EcoRl-Xhol), and pET21b+ (using either Ndel-Xhol or Nhel-Xhol). 
5'-end primer tail: CGCGGATCCCATATG (BamHl-Ndel ) 

5 CGC GGATCCGCTAGC (BamHl-Nhel) 

CCG GAATTC T AGCTAGC (EcoRI-Nhel) 
3'-end primer tail: CCCGCTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
1 0 the same 3 ' Xhol primer was used as before: 

5'-end primer tail: GGAATTC CATATG GCCATGG (MM) 
5 '-end primer tail: CG GGATCC (BamHl) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terrninus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATCA GCTAGC CATATG (Mel) 
3'-end primer tail: CG GGATCC (BamHl) 
As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 

T m = 4 (G+Q+ 2 (A+T) (tail excluded) 

T m = 64.9 + 0.4 1 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For amplification, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference. In particular, the following codons were 
changed: ATA-+ATT; TCG-VTCT; CAG-»CAA; AAG-»AAA; GAG->GAA; CGA->CGC; 
5 CGG-»CGC; GGG-»GGC. Italicised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either lOOpl or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-10pmol/ul. 

C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40uM of each oligo, 400-800uM dNTPs solution, 1 x PCR buffer (including 
1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of lOpl DMSO or 50ul 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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95°C 


65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1 % agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30 pi or 50 pi of either water or lOmM Tris, pH 8.5. 

1 0 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeUXhol or NhellXhol for cloning into pET-21b+ and further expression of the protein 
as a C-terminus His-tag fusion 

1 5 - BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 

protein as N-terminus GST fusion. 

- For ORF 76, Nhel/BamHl for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/PstI, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
20 the protein as N-terminus His-tag fusion 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40pl final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50ul of either water or 1 OmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOug plasmid was double-digested with 50 units of each restriction enzyme in 200ul reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50ul of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 2fi0 of the sample, 
and adjusted to 50ug/ul lul of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia) . 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20^1, a molar ratio of 3:1 fragment/vector was ligated using O.Sjal 
of NEB T4 DNA ligase (400 units/ul), in the presence of the buffer supplied by the manufacturer. 
1 5 The reaction was incubated at room temperature for 3 hours. In some experiments, ligation was 
performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, lOOul E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800u.l LB broth, again at 37°C for 20 minutes. The cells were then 
20 centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200^1 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + 100u.g/ml 
ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30ul 5ul of each 
individual miniprep (approximately lg ) were digested with either NdeVXhdl or BamHUXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRl-Pstl cloning sites or, for ORFs 115 
& 127, EcoRl-Sall or, for ORF 122, SaK-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W3 1 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50u.l/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product, lul of each construct was used to transform 30ul of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 

10 above. In the case of the pGEX-His vector, the same E.coli strain (W3110) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(100ug/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (100ng/ml) in 
100ml flasks, making sure that the OD 600 ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

15 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 

20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

25 diluted 1 :30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 

30 The supernatant was collected and mixed with 150ul Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 280 of 0.02-0.06. The GST-fusion 
5 protein was eluted by addition of 700ul cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 280 was 0.1. 21ul of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 116.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
10 be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500ul PBS pH 7.2]. 25ul lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0. 1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 

20 at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 113, 1 19 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD 550 0.6-0.8. Protein expression was induced by addition of ImM IP TG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 

30 the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

5 For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 minutes. 

Supernatants were collected and mixed with 150ul Ni 2+ -resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
10 for 30 minutes. The sample was centrifuged at 700g- for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
15 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700u,l of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 2g0 was 0.1. 21ul of each 
20 fraction were loaded on a 1 2% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20ug/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
25 14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD 2g0 ) - (0.76 x OD 260 ) 
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L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20ug of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1, 21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. lOOul bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 in PBS). 200ul of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
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three times with PBT. 200ul of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. 100u.l of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
5 37°C. Wells were washed three times with PBT buffer. lOOul of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and lOu.1 of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. 100ul H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

10 O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 

1 5 let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD 620 of 0.07. lOOul bacterial cells were added to each well of a Costar 96 well 
plate. lOOul of diluted (1:200) sera (in blocking buffer) were added to each well and plates 

20 incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200|a.l/well of blocking buffer in each well. lOOul of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200ul/well of blocking buffer. The supernatant was aspirated and cells 

25 resuspended in 200uVwell of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 
centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by centrifugation 
at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant further ultracentrifuged at 50000g for 75 minutes to pellet the outer 
membranes. The outer membranes were resuspended in lOmM Tris-HCl, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

15 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5ug) and total cell extracts (25u.g) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 

20 at 4°C in saturation buffer (10% skimmed milk, 0.1% Triton X100 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton XI 00 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton XI 00 in PBS and developed with 

25 the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
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tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 520 of 0.5, diluted 
1 :20000 in Gey's buffer and stored at 25°C. 

50ul of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25ul of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25ul of the previously described bacterial suspension were added to each well. 
25ul of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22ul of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22(0.1 of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 
Example 1 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA . AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A. GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 

501 AGACCG... 

This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 

1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VI YAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

401 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 



The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

orf37 .pep MKQTVXMLAAALIALGLHRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

I I I I I I I I I I I : II I I : I I : I I I : I 

orf37a MKQTVKWLAAALIALGLNQAWA DDVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 



orf37a RALAQEWLGKACQNGYQDSCDNDQRLKAGYX 

70 80 90 

30 Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

35 201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

40 1 MKQTVKWLAA ALIALGLNQA WA GDV5DFR ENLQAAEQGN AAAQFNLGVM 

51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 11a; 
overlap with ORF37ng: 

45 orf37.pep MKQTVXMLAAALIALGLNRPWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 60 

M Mill: II 111111,11 II II :|||:|| : l|:| 

orf37ng MKQTVKWLAAALIALGLNQAWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 60 

orf 37 .pep DAEAVRWYRQEAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 120 
50 :: I I : I I I :: I I I I I I I I I I I I I : I I I I I I : I : I : I : I 

o r f 3 7 ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 120 

orf 3 7. pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 168 

55 orf37ng RLKAGY 126 
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The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

orf 37-1 .pep MKQTVKWLAAALIALGLNRAVWADDVSDFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 

I I I I I I I I I I I I: I I I I I I I I I 1111:111:11 : I : I I I : I 

orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 37-1 . pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 

::||:|||:|:|ll MINIM II MINIM 
orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 



15 orf 37-1. pep 

I I I I : I : I I I I 

orf37ng LALAQQWLGKAC 

20 

orf37-l .pep 

I I I I I :: I I I I I I I I I I I I 
orf37ng QNGDQNSCDNDQRLKAGYX 
110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in Kcoli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 

Example 2 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 

GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 

40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 

GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 

TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
45 GCCGAATAA 



This corresponds to the amino acid sequence <SEQ ID 10>: 



WO 99/24578 PCT/IB98/01665 
-63- 

101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a hypothetical H.influenzae protein (vbrd.haein; accession number p45029) 

SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

yrbd.h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGWIGRVSAITLDE 

|::||||||:||:| : I I : : I I I : I I : I I 
N.m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

10 20 30 

80 90 100 110 120 130 

yrbd . h KSYLPKVSIAINQEYNE I PENSSLS IKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 

N.m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDT I SVT 

40 50 60 70 80 

140 150 160 

yrbd.h TSAMVLEDLIGQFL — YGSKKSDGNEKSESTEQ 
: I I I I I I : I I I : I : :::|::||:: ::::|: 
N.m S S AMVLENL I GKFMT S FAEKNADGGNAEKAAEX 
90 100 110 120 

Homology with a predicted ORF from N.sonorrhoeae 

SEQ ID 9 shows 99.2% identity over a 1 18aa overlap with a predicted ORF from N. gonorrhoeae: 

20 30 40 50 60 70 

yrbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
N.m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

10 20 30 

80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
N.m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 



yrbd VLENLIGKFMTSFAEKNAEGGNAEKAAEX 
I I I I I I I I I I I I II I I I I : I I I I I I I I I I 
N.m VLENLIGKFMTS FAEKNADGGNAEKAAEX 

100 110 120 

The complete yrbd H.influenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
epitopes, could be a useful antigen for vaccines or diagnostics. 

Example 3 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 11>: 

1 . . ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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351 GCGCAACGCg CTTTCGTGGG ACGAAAAATT CGCCTGCGAT GTTTGGTATA 
401 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 

451 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA.aCCAT 
501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 
551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 
601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 
651 CATCGGCACG ACGCTGCTGC TTGAAAACAG TTTATCGCCC GAACAATACG 
701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 

751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 
801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 
851 AAGCGGTCG . . 

This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 

1 . . ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 
51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 

101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 
151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 
201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 

251 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV . . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

201 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT ATGGAATATC 

301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 

651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 

7 01 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 

7 51 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 

1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSVVMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 

401 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 
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ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MSKFFKRLFDIVASA SGLIFLSPVFLILI YLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 



SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I : I : I MM M M M M M M M M M I II I I I I I :l I hi I! I I I I III I I I I I 1 
SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 



YDNFQNRRHEMKPGITGWAQVNGRKALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
YDNFQNRRHEMKPGITGWAQVNGRNA1SWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 



IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

I I II I I I I II Ml I II II I I I : I I I I M II I I I 

IKEG I S AQGEATMP P FTGKRKLAWGAGGHGKWAE LAAALGT YGE I VFLDDRVQG SVNG 
190 200 210 220 230 240 



280 

orf3.pep VGQGSWMAKAV 

o r f 3 a VGQGG WMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHI S PGAHL S GNTRI GEE SW 

310 320 330 340 350 360 

The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

4 01 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

7 01 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

7 51 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCG? TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ED 16>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 
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301 VGQGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 
4 01 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



5 ORF3- 1 shows 94.6% identity in 4 1 0 aa overlap with ORF3a: 



MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MSKFFKRLFDIVASASGLI FLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 



orf 3a . pep SMHDALDS DGI LLPDGERLTPFGKKLRAAS LDELPELWNVLKGDMS LVGPRPLLMQYLPL 

11:11111111 I I I I I I I I II I I I I I I I I : I I I : I I I I I I II I I I I I I I I 

orf 3-1 SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 

I I I I I I I I I I I I I I I I I I I I I: I I I I: I I I I I I I I I I I I I I I I I I I I I I I 

orf 3-1 YDNFQNRRHEMKPGITGKAQVNGRNALSWDEKFACDVWY1DHFSLCLDIKILLLTVKKVL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 3a . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKVVAELAAALGTYGEIVFLDDRVQGSVNG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I : I I I I I I 

O r f 3 - 1 IKEG I S AQGEATMP P FTGKRKLAWGAGGHGKWADLAAALGRYRE I VFLDDRAQG SVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNP.IRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

I : I : I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I 

orf 3-1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 3a . pep VGQGGWMAKAWQADSVLKDGVIVMTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
I I I I : I II I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I II : I I I I I I 
orf 3-1 VGQGSVVMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

orf 3a . pep IGTGACSRQQIRIGSRATIGAGAWVRDVSDGMTVAGNPAKPLAGKNTETLRSX 
I II I I I I II I I I I I I M I I I I ! I I I I I ! I I I I I I II I II 1 I I I II II 

orf 3-1 IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 
370 380 390 400 410 

Homology with hypothetical protein encoded by wfc gene (accession Z7192S0 of B. subtilis 
ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

ORF3 



0RF3 

ORF3 
yvfc 



27 



IYLIRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 
I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 



63 ASXDELPELWNILKGEMSLVC-PRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 

S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
87 LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAIS 14 6 

123 WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 17 2 

W++KF DVWY+D++S LD EGI T FTG 

147 WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 196 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 



orf3 ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 

: I I I I I I I I I I I I I I : : I I I I I I I ! I I I I I I I I 
orf3ng MSKAVKRLFDIIASA SGLIVLSPVFLVLIYLI RKNKGSPVFFIRERPGKDGKPFKMVKFR 60 

orf3 SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 94 

Mil: I I E I : I I I I I: I I I I I I I I I : I I I I I I I II 

orf3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

orf3 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 

I :: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I : I I : I I : I I I : I I I I I I I 
orf3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

orf3 IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 214 

I I I I I I I I I I I I I I I : I : II I I I : I I I I I I I I I I : I I I I I I I I I II I I I I : I I I I I I 
orf3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 2 40 

orf3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

IIIIIII:|::|||IHIIIIII:|: I I I I I : I I I I I I I I I I 

orf3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

orf3 VGQGSVVMAKAV 286 
Mill 

orf3ng IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 

The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 

1 ATGAGTAAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATTCCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCT m T TGATGCAGTA 

351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGATATGAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

701 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 

851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 

901 ATCGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGACGGCAT AA 

This encodes a protein having amino acid sequence <SEQ ID 18>: 

1 MSKAVKRLFD IIAS ASGLIV L5PVFLVLIY LI RKNLGSPV FFIRERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDSERLT DFGKKLRATS LDELPELWNV 

101 LKGEMSLVGP RPLLMQYLPL YNKFQNRRH3 MKFGITGWAQ VNGRNALSWD 

151 EKFSCDVWYT DNFSFWLDMK ILFLTVKKVL IKEGISAQGE ATMPPFAGNR 

201 KLAVIGAGGH GKWAELAAA LGTYGEIVFL DDRTQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD ITVAVGNNRI RRQITENAAA LGFKLPVLIH PDATVSPSAI 

301 IGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESR IGTGACSRQQ TT VGSGVTAG AGAVIVCDI P DGMTVAGNPA 

401 KPLTGKNPKT GTA* 
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This protein shows 86.9% identity in 413 aa overlap with ORF3-1: 

10 20 30 40 50 60 

MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

I I I ! : I Ilhlllll I : : I I I I ! I I I I 

MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

70 80 90 100 110 120 

SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I I I I I I ! I I I II I I : I I I I llllllhlllllllllhlllMMIIIIIMIIIMI 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

130 140 150 160 170 180 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

I :: I I I I I I I I II I I I I I I I I I I I I I I I : |:||: I I : I I I : I I I I II I 

YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 
130 140 150 160 170 180 

190 200 210 220 230 240 

IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

I I : I : I I ! I I I I I I : I I I I I I I I I I : II I I I I 

IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 
190 200 210 220 230 240 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

I Ml Ill ::| 11111:1:1 Nihil II 

FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 
250 260 270 280 290 300 

310 320 330 340 350 360 

VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 
: II I I I I I I I I II I I I I I I I I I II I I I I 1 I I I I I 1 I I : I I I I I I I I I I I I I I I : I I I I I 
IGQGSWMAKAVVQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 
310 320 330 340 350 360 

370 380 390 400 410 

IGTGACSRQQIRIGSRATIGAGAWVRDVSDGMTVAGNPAKPLPRKNPETSTAX 

I I 1 i I 1 t I I I :H :| 11111:1 I: I Mill llhhlll 

IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 
370 380 390 400 410 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl|PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
45 >gi|1945702|gnl|PID|e313004 (Z94043) hypothetical protein [Bacillus subtilis] 

>gi|2635938lgnl|PID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities = 114/195 (58%), Positives = 142/195 (72%) 







5 


VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 


64 








+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 






Sbjct: 


3 


LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 


62 


55 




65 


ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 


124 








DS G LPD RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 






Sbjct: 


63 


ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 


122 


60 




125 


QNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVLIKEG 


184 






Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 






Sbjct: 


123 


QARRHEVKPGITGWAQINGRNAISWEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEG 


182 






185 


I S AQGEATMP P FAGN 19 9 




65 






I T F G+ 




Sbjct: 


183 


IQQTNHVTAERFTGS 197 





orf3-l.pep 



orf 3-1 .pep 



orf3-l.pep 



orf3-l.pep 



orf3-l.pep 



orf3-l.pep 
orf3ng 



orf3-l.pep 
orf3ng 
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The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N. gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 19>: 

1 . .AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . . NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFS DLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDE FDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A oiN. meningitidis <SEQ ID 23 >: 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 

7 51 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC GGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 



1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NJVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFKP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
201 ERWRIHAATE IEDINAFFGT EYSSEEA3TI GGXGHSGIGT PARARRKSXY 
251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

10 20 30 

orf5.pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I I I I I 1111111:1 

orf5a FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 

40 50 60 70 80 90 

orf5.pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
I I I I I I I : II I I I I I I I :: I I I I I I I I I I I I I : I I I I I I I MM III :|| I 
orf5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 



The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 



MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLE KVLDFSDLEV 
I I I I I I I I II 1 I I I I I I II I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 



70 80 90 100 110 120 

orf5a.pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Orf5-l RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf5a.pep EQFHLKS I LRPAVFVPEGKSLTALLK2FREQRNHMAIVIDEYGGTSGLVT FEDIIEQIVG 
I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf5-l EQFHLKS ILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVT FEDIIEQIVG 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 5a . pep DIEDEFDEDE SADNIHAVSAERWRIHAATE IE DINAFFGTEYSSEEADTIGGXGHSGIGT 

orf5-l EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
190 200 210 220 230 



250 260 270 280 290 300 
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orf 5a . pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXT 
I 1 I I I I I III I I I : I I I I I I I I I I I I I I I I : I I I : I II I I I I I I I I I I I I I I I 
orf 5-1 SARARRKSPYRRFAVKRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 280 290 

Further work identified the a partial DNA sequence in N.gonorrhoeae <SEQ ID 25> which 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC T C AAATAT AT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

4 01 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 

551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

7 01 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORF5ng): 

orf5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 30 

I I I I I I I I I II I II I I I I I : I 

orf5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf 5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

:lll:||:||:: 111111:111111: Mill I Ml Ml I 

orf5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

orf 5 RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSX RRFCTV 131 

II I I I I II I II I I I I I I I I I I I I : I I I II I I I I MUM 
orf5ng RARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
304 aa overlap: 

10 20 30 40 50 60 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
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orf5ng-l.pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 



EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 



orf 5ng-l . pep PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 
I I I I I II I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I I I : II I I I I I 

orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVS IRP 

240 250 260 270 280 290 

orf5ng-l.pep IRQTX 

orf 5-1 IRQTX 
300 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlyC (accession U32716) of H. influenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

ORF5 2 HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 

HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
TlyC 166 HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 224 

ORF5 62 INTFFGTEYSIEEADTI 7 8 

N F T++ EE DTI 
TlyC 225 FNAQFNTDFDDEEVDTI 241 

ORF5ng-l also shows significant homology with TlyC: 

301 Initn: 419 Opt: 668 
668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 

I II: |::|: : I : | :::::: | :::::::: | :| :| 
tlyc_haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 
10 20 30 40 50 60 

60 70 80 90 100 109 

orf5ng-l.pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE — DKDEVLGILH 

I:::|||:||| II II:: :::::::: :|::|| I:: |:|:::|||| 

tlyc_haein VMEIAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 
70 80 90 100 110 120 

110 120 130 140 150 160 

orf5ng-l.pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 

tlyc_haein AKDLLKFLREDAEVFDLSSLLRPWIVPESmVDRMLKDFRSERFHMAIWDEFGAVSGL 
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130 140 150 160 170 180 

170 180 190 200 210 220 

orf5ng-l.pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 

tlyc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 
190 200 210 220 230 

230 240 250 260 270 280 

orf5ng-l.pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 



Homology with a hypothetical secreted protein from E.coli: 
ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp|P77392|YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi I 1778577 (U82598) similar to H. influenzae [Escherichia coli] >gi I 1786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length = 292 

. Gaps = 3/230 (1%) 

25 Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
SbjCt: 10 DTISNKKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query: 61 RDAMITR3RMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
30 RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

SbjCt: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query: 120 PEQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 17 9 
E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
35 Sbjct: 130 AEAFSMDKVLRQAVWPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query: 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
SbjCt: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

40 Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
N.meningitidis and ~N. gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 2A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 5 

50 The following partial DNA sequence was identified in N.meningitidis <SEQ ID 29>: 



1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 
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101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

401 CAACCGCCTG AAAAT CGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

451 ACGGCATGGG TGCGGCATAC AAGGGCAAAA T CCGT AAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ID 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP.. 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

401 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKNRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by yces gene (accession P44270) of H. influenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ IEG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLEMAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 

ORF7 56 NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 

N EG +PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

ORF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYT 17 5 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVA3VFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 
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0RF7 17 6 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 

1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGK1LP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of N. 
meningitidis: 

10 20 30 

orf 7 .pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKVIDATP 



DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 



EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSV1Y 
I I I I I I I I I I I I I I I I I I I I I I I :! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 



GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 

GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 



orf7a DGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 
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This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

5 151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKY ILK K* 

A leader peptide is underlined. 
10 ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 



MLRKLLKWSAVFLTVSAAVFAALLFVPKONGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 



HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 



IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 



QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I I I I I I I ! I 
QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
190 200 210 220 230 240 

250 260 270 280 290 300 

PSVIYGMGAA YKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
PSVIYGMGAAYKGKIRKADLRRDTPYN.YTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 



310 



320 



330 



FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
FVSKMDGTGLSQFSHDLTEHNAAVRKY I LKKX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 



orf7 


MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 

1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 ! 1 

MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 


60 




60 


orf7 


FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 


120 


orf7ng 


FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 


120 


orf7 


HEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 

1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

HEADRDHVASVFWRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 


180 


orf7ng 


180 



orf7 



PTPIALP 



187 
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orf7ng PTRIALPGKftAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An ORF7ng nucleotide sequence <SEQ ED 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVI DAT P DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 

1 . . taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 

251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACT GAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

4 01 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

4 51 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT ATAAAAACCC 

501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 

551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

751 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

801 tttcgtgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AG C CAT GAT T 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA AAAATAA 

This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 

1 . . YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV IYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 

10 20 30 40 50 60 

orf 7-1 . pep KLLKW SAVFLT VSAAVFAALLFVPKDNGRAYR I KI AKNQG I S S VGRKLAE DRIVFSRHVL 

orf7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 



TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 



orf 7-1. pep TPDIGHDTKGWSNEKLKAEVAPDAFSGKPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 



190 200 210 220 230 240 

LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 

I I I I I : I II 111:11111 I I I I I I I I I I I I I I I I I I I I I I I 

LNEAWAGRQDGLPYKNPYE^LIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
160 170 180 190 200 210 



orf7-l.pep 



250 260 270 280 290 300 

IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 
IYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 
220 230 240 250 260 270 



KMDGTGLSQFSHDLTEHNAAVRKYILKKX 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

spl P28306 IYCEGECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi I 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional C-terminal residues [Escherichia coli] Length = 340 

Score = 79 (36.2 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 

Identities = 20/87 (22%), Positives = 40/87 (45%) 



Query: 


10 


GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 


69 






G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 




Sbjct: 


49 


GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 


108 


Query: 


70 


SVTVQIIEGSRFSHMRKVIDATPDIGH 96 








++++EG R S K + P I H 




Sbjct: 


109 


QFPLRLVEGMRLSDYLKQLREAPYIKH 135 




Score 


= 438 


(200.7 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 




Ident: 


.ties = 


= 84/155 (54%), Positives = 111/155 (71%) 




Query: 


120 


EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 


179 






EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS+IEK 




Sbjct: 


158 


EGWFWP DTWMYTANTT DVALLKRAHKKMVKAVDSAWEGRADGL P YKDKNQLVTMAS I IEK 


217 




180 


ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 239 






ET ++RD VASVF+NRL+ IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 




Sbjct: 


218 


ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 


277 




240 


GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 274 








GLPP IA PG ++ AAAHP+ YLYFV+ G 




Sbjct: 


278 


GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 





Based on this analysis, including the fact that the H.influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

4 51 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 



1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQ? KEVGKVFRKQ QRYSEEEIKN 
51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 
101 ERALEMAVSL NAFEQASMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 
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151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

2 51 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

401 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

4 51 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGG CAAT CGGAGCTTTG CAGCGTTTGG 

7 01 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

7 51 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAA? ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

1401 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

14 51 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

17 01 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 

17 51 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MNYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

401 RVRKLPEQQG RYFTADNLSK IQMLALSKL? DKREALRGLD KIIEKPPAGS 

451 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDCAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicte d ORF from N.meninzitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 

meningitidis: 

10 20 30 40 50 

orf 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
I ! : I : I I : I : I : I I I : II I I : I I I I I I I I I I I I I I I I I I II II I II I I II 
orf 9a MLPARFTILSVLAAALLAGQAYAA — GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
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10 20 30 40 50 

60 70 80 90 100 110 

AVGERTOQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
60 70 80 90 100 110 

120 130 140 150 160 

EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 



120 130 140 150 160 170 

AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
180 190 200 210 220 230 

The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 

1 ATGTTACCCG CCCGTTTCAC CATTTTATCT GTGCTCGCGG CAGCCCTGCT 

51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 

101 AAGTCGGAAA GGTTTTCAGA AAGCAGCAGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAGCGGG TTAATCAGAT 

201 ATTTACGTTG CTGGGANGGG AAACCGCCTT GCAAAAGGGG CAGGCGGGAA 

2 51 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCNCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGGCGGCA GATTGAGCCT ATACCGGGTA 

4 01 AGGCGCAAAA ACGGGCGGGG TGGCTGCGGA ACGTGCTGAG GGAAAGAGGA 

451 AATCAGCATC TAGACGGACT GGAAGAANTG CTGGCTCAGG CGGACGAANG 

501 ACAGAACCGC AGGGTGTTTT TATTGTTGGC ACAAGCCGCC GTGCAACAGG 

551 ACGGGTTGGC GCAAAAAGCA TCGAAAGCGG TTCGCCGCGC GGCGTTGAGA 

601 TATGAACATC TGCCCGAAGC GGCGGTTGCC GATGTGGTGT TCAGCGTACA 

651 GGNACGCGAA AAGGAAAAGG CAATCGGAGC TTTGCAGCGT TTGGCGAAGC 

7 01 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCACAGGCT GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACGC 

901 AATCCGAATG CAGACCTGTA TATTCAGGCA GCGATATTGG CGGCAAACCG 

951 AAAAGAANGT GCTTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCAGGGCGG CAATGACGGC GGCGATGATA 

1051 TATGCCGACC GAAGGGATTA CACCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG TGTGCTGGCG GCTGCGGCGG 

1151 CTGTCGAGTT GGACNGCGGC AGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGTTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAGGCTT 

1301 TGAGGGGGTT GGACAAGATT ATCGAAAAAC CGCCTGCCGG CAGTAATACA 

1351 GAGTTACAGG CAGAGGCATT GGTACAGCGG TCAGTTGTTT ACGATCGGCT 

14 01 TGGCAAGCGG AAAAAAATGA TTTCAGATCT TGAAAGGGCG TTCAGGCTTG 

14 51 CACCCGATAA CGCTCAGATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA AGGCTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CTGTCAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAANG CGACGCGGAA AGCGCGCTGC CGTATCTGCG GTATTCGTTT 

1651 GAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

17 01 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

17 51 ACCTTACGGG AGACAAGAAA ATATGGCGGG AAACGCTCAA ACGTCACGGC 

1801 ATCGCATTGC CCCAACCTTC CCGAAAACCT CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 44>: 

1 MLPARFTILS VLAAALLAGQ AYAAGA ADAK PPKEVGKVFR KQQRYSEEEI 

51 KNERARLAAV GERVNQIFTL LGXETALQKG QAGTALATYM LMLERTKS PE 

101 VAE RALEMAV SLNAFEQAEM IYQKWRQIEP I PGKAQKRAG WLRNVLRERG 

151 NQHLDGLEEX LAQADEXQNR RVFLLLAQAA VQQDGLAQKA SKAVRRAALR 

201 YEHLPEAAVA DWFSVQXRE KEKAIGALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLHRLDDA YARLNVLLER 

301 NPNADLYIQA AILAANRKEX ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

351 YADRRDYTKV RQWLKKVSAP EYLFDKGVLA AAAAVELDXG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MFALSKLPDK REALRGLDKI IEKPPAGSNT 

451 ELQAEALVOR SWYDRLGKR KKMISDLERA FRLAPDNAQI MNNLGYSLLS 

501 DSKRLDEGFA LLQTAYQIN? DDTAVNDSIG WAYYLKXDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLTGDKK IWRETLKRHG 



orf 9 .pep 
orf 9a 

orf 9. pep 
orf 9a 

orf 9a 
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601 IALPQPSRKP RK* 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

orf 9a. pep MLPARFTILSVLAAALLAGQAYAAG--AADAKPPKEVGKVFRKQQRYSEEEIKNEPARLA 

:|:I|:|:|:MI: III |:| I I I I I MINIM 

orf 9-1 MLPNRFKMLTVLTATLIAGQVSAA3GGAC-DMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 9a . pep AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

orf 9-1 AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
70 80 90 100 110 120 

120 130 140 150 160 170 

orf 9a . pep EMI YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 

I M I II I II I II I II II II II I I Mill I 

orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 

II I II I II I II I I II I M I I I : II I I I I II II II II I I I I I II II II I I II II I I I I I I 
orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 9a. pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
II II I II II I II I I II I I I II I I I I I II I II I I II I I II I II II II II I II I II I II I II 
orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

300 310 320 330 340 350 

orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 

I I II I II I I I II I II I I II II II I I II II II II I I I I I I I : II I : \ II I : I I I II I I : 
orf 9-1 ERNPNADLYIQAAILAANRK3GASVI DGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 

III I I II I II I I I I I I I I I I I I I Ill I I I I I I 

orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAC-SNTELQAEALVQRSWYDRLGKRKKMISDLE 

II I M I II I I M I II M I I I I II II MM 

orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

I I II I I I I II 1 II I I I : II M I II I I II II I I I I II I II II I I 

orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 

540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPSPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

I I II I II II I II II I I II I II I II M I II II 

orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

550 560 570 580 590 600 

600 610 
orf 9a. pep HGIALPQPSRKPRKX 

II II I I II II II I I I 
orf9-l HGIALPQPSRKPRKX 

610 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 
gonorrhoeae: 



Orf9 


RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 

11 :|:||:|:|:|||: II l|:|:: llllllhll:: III! 

MIMLPARFTILSVLAAALLAGQAYAA— GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 


54 


orf 9ng 


58 


orf9 


LAAVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 
1 I I 1 1 II 1 1 : : II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 
LAAVGERVNRVFTLLGGETALQKGQAGTALA7YMLMLERTKSPEVAERALEMAVSLNAFE 


114 


orf 9ng 


118 


orf9 


QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 


166 


orf 9ng 


QAEMIYQKWRQIE P I PGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQS DYVHQPMI FLLL 


178 



The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

2 51 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 



Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAAT C 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

3 51 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

4 01 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 
4 51 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 
501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 
551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 
601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 
651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 
7 01 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

7 51 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

8 01 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 
851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 
901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 
951 AAAAGAAGGT GCGTCCGTTA TCGACGGC7A CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGAT7A CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

1401 cggCAAACGG GGAAAAAT G A TTGCCGACCT tgaAACcgcg CTCAAACTTA 

1451 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
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17 01 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
17 51 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 



MLPARFTILS VLAAALLAGQ AYAAGA A3VE 



KNERARLAAV 
VAERALEMAV 
NQHLDGLKEV 
YEHLPEAAVA 
RKYPEILDGF 
NPNANLYIQA 
YADRRDYAKV 
RKLPEQQGRY 
EPLAEALAQR 
DSKRLDEGFA 
ENDPEPEVAA 
IALPEPSRKP 



GERVNRVFTL 
SLNAFEQAEM 
LAQSDDVQKR 
DAVFGVQGRE 
FEQTDTQNLS 
AILAANRKEG 
RQWLKKVSAP 
FTADNLSKIQ 
SIIYEQFGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



LGGETALQKG 
IYQKWRQIEP 
RIFLLLVQAA 
KEKAIEALQR 
AVWQEMEIMN 
ASVIDGYAEK 
EYLFDKGVLA 
MLALSKLPDK 
GKMIADLETA 
DDTAVNDSIG 
ERDQAVDVWT 



LPKEVGKVLR 
QAGTALATYM 
IPGEAQKPAG 
VQQGGVAQKA 
LAKLDTEILP 
LVSLRKPDDA 
AYGRGTGEQR 
AAAAAELDGG 
REALIGLNNI 
LKLTPDNAQI 
WAYYLKGDAE 
QAAHLRGDKK 



KHRRYSEEEI 
LMLERTKSPE 
WLRNVLKEGG 
SKAVRRAALK 
PTLMTLRLTA 
YARLNVLLEH 
GRAAMTAAMI 
RAALRQIGRV 
IAKLSAAGST 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRYG 



ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 



orf 9-1 . pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

: I : I I: I: I: I I I : III I : I :: I I I I I I I: I I :: I i I I I 

orf9ng-l MLPARFTILSVLAAALLAGQAYAAG— AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 



orf 9-1 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 



120 

190 200 210 220 230 240 

AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 

: I I I I I I I I I I I I I I I I I i I : I I : I I I I II II 

AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
180 190 200 210 220 230 



orf 9-1. pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 



310 320 330 340 350 360 

ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

I : I I I I : I I I I I I I I I I I 1 I I I I I I I I I I I I llhllhllll: II 

EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I ! II I I I I I 
KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 



IQMLALSKLPDKREALIGLNNI IAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 



490 500 510 520 530 540 
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orf 9-1 .pep 
orf 9ng-l 



RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQ'TAYQINPDDTAVNDSIGWAYYLKGD 



orf 9-1 .pep 
orf 9ng-l 



>rf 9ng-l 



HGIALPQPSRKPRKX 



In addition, ORF9ng shows significant homology with a hypothetical protein from P. aeruginosa: 

sp|P42810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 
(ORF3) 

>gi|1072999|pir| IS49376 hypothetical protein 3 - Pseudomonas aeruginosa >gi|557259 
(X82071) orf3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect = le-28 

Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps = 125/587 (21%) 



Query: 


67 


Sbjct: 


53 




127 


Sbjct: 


113 


Query: 


173 


Sbjct: 


173 




233 


Sbjct: 


215 


Query: 


288 


Sbjct: 


271 


Query: 


313 


Sbjct: 


331 


Query: 


372 


Sbjct: 


389 


Query: 


432 


Sbjct: 


409 


Query: 


492 


Sbjct: 


4 63 


Query: 


552 


Sbjct: 


523 



+++LL E A Q+ - 



LA +K+ A +D YA+ 



++T+ P V+ERA 



-YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 



T ++ A R D A R 



L+ A+++NPDD A+ DS+GW Y +G 



P+ EVAAHLGEVLWA G - 



gi 1 2983399 (AE000710) hypothetical protein [Aquifex aeolicus] Length - 545 
Score =81.5 bits (198), Expect = le-14 

Identities = 61/198 (30%), Positives = 98/198 (48%), Gaps = 19/198 (9%) 
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Sb:ct: 


335 






Query: 


460 


RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS-- DSKRLDEGFALLQ 


513 






+I+Y+ G L A++L P+N N LGYSLL +R++E L++ 




Sbjct: 


391 


VYFMEAIVYDNLGDIKNAEKALRKAIELDPENPDYYNYLGYSLLLWYGKERVEEAEELIK 






514 


TAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSF-ENDPEPEVAAHLGEVLWALGER 


572 






A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 




Sbjct: 


451 


KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 


510 




573 


DQAVDVWTQAAHLRGDKK 590 








++A + + +A L + K 




Sbjct: 


511 


EEARNYYERALKLLEEGK 528 





Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CG&CTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

4 51 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

7 01 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 



1 . . NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 

51 W AIIVLTIIV KAVLYPLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 

101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 

151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 

201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 51>: 



1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGC3CCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

7 01 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG C CAT C CAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CA*iAGACTAC GGCAAAGTAC 
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1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

1401 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACTTAT CTGAACCCGC 

1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCGTTGGTT 

1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 



1 MDFKRLTAFF AIALVIMIGW EKKFPTPKPV PAPQQAAQQQ AVTASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

401 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLY WVVNN LLTIAQQWHI NRSIEKQRAQ GEWS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a 60kDa inner-membrane protein (accession P25754) of Pseudomonas putida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

ORFll 2 LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLTIIVK 61 

LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
60K 324 LYAGPKIQSKLKELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 383 

ORFll 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 

+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

60K 384 GLFFPLSAASYRSMARMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPI 443 

ORFll 122 LLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPT 181 

L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
60K 444 LVQMPVFLALYWVLLESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPP 503 

ORFll 182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWVVNNLLT IAQQWHINRSIE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

60K 504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWWNNCLSISQQWYITRRIE 552 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORFll shows 97.9% identity over a 240aa overlap with an ORF (ORF 11a) from strain A of N. 
meningitidis: 



NLYAGPQTTSVIANIADNLQLAKDYGKVHW 



FASPLFWLLNQLHNIIGNKGWAIIVLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
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220 230 240 

orf 11 .pep WVVNNLLTIAQQWHINRSIEKQRAQGEWSX 

or f 11a WVINNLLTIAQQWHINRSIEKQRAQGEVVSX 
520 530 540 

The complete length ORF1 la nucleotide sequence <SEQ ID 53> is: 

1 ANGGATTTTA AAAG ACT CAC NGNGTTTTTC GCCATCGCAC TGGTGATTAT 

51 GATCGGATNG NAAANGATGT TCCCCACTCC GAAGCCCGTC CCCGCGCCCC 

101 AACAGACGGC ACAACAACAG GCCGTAANCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGNAN CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CNAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAANAA 

301 TACACCTACN TCGCCCANTC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

401 GCGACAAAGT TGAAGTCCGC CTGAGCGCAC CTGAAACACG CGGTCTGAAA 

4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCGACTTGG ACGACGATGC CAANTCCGGN AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CNGACCGGCT GGCTCGGCAT GATTGAACAC 

7 51 CACTTCATGT CCACCTGGAT CCTCCAACCC AAAGGCGGAC AAAGCGTTTG 

801 CGCCGCTGGC GACTGCNGTA TNGACATCAA ACGCCGCAAC GACAAGCTGT 

851 AC AG CAC C AG CGTCAGCGTG CCTTTAGCCG CTATCCAAAA CGGTGCGAAA 

901 TCCNAAGCCT CCATCAACCT CTACGCCGGC CCACAGACCA CATCNGTTAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGN CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCCCTC TTTTGGCTTT TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGTTCGATG GCGAAAATGC 

1151 GTGCCGCCGC GCCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AG CAACAAG C CATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCNT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACCTAT CTGAACCCGC 

14 51 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCTTTGGTT 

1501 NTNTCNNNNA NGTTCTTCNN CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 GATCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 54>: 

1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

2 01 HSYVGPWYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGMIEH 

251 HFMSTWILQP KGGQSVCAAG DCXXDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL C-GCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 XSXXFFXFPA GLVLYW VINN LLTIAQQWHI NRSIEKQRAQ GEWS* 

ORF1 la and ORF1 1-1 show 95.2% identity in 544 aa overlap: 



WO 99/24578 



-88- 



PCT/IB98/01665 



XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 
MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 



DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
I I I I I ! I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I III I I I I I II I I I I I I I I 
DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
70 80 90 100 110 120 



IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
130 140 150 160 170 180 



190 200 210 220 230 240 

SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 

I I I I II I I I I I I I I I I I I I I I I I I I I I Ill 

SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



310 320 330 340 350 360 

SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 

: II I I I I I I I I I I III I I I I II I I I I II I I I I I I 

AEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 

310 320 330 340 350 360 

370 380 390 400 410 420 

LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II 
LTIIVJCAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 



430 440 450 460 470 480 

orf 11a . pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

I I I II I I I I I I I I I I I II I I I I I I I II I Ml 

45 orf 11-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

430 440 450 460 470 480 



490 500 510 520 530 540 

orf 11a . pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLT1AQQWHINRSIEKQRAQ 

50 I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 

orf 11-1 LNPPPTDPMQAKMMKIMPLVFSWFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 



55 orf 11a. pep GEWSX 

I I I I I I 

orfll-1 GEWSX 



60 Homology with a predicted ORF from N.eonorrhoeae 

ORF11 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORFlLng) from N. 
gonorrhoeae: 



Orfll NLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLT 57 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I I 

65 orfllng MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFAS PLFWLLNQLHNIIGNWGWAIWLT 60 
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orfll IIVKAVLYPLTNRSYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

orfllng IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFEDEEINPLGG 120 

orfll CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 177 

orfllng CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

orfll PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 

orfllng PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 240 

orfll WS 240 

I I I 

orfllng WS 243 

An ORF1 lng nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 

1 MAVNLYAGPQ TT3VIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL T NASYRSMAK MRAAAPELQT IKEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMP LVFS 

201 VMFFFFPAGL VLY WWNNLL TIAQQWHINR SIEKQRAQGE WS* 

Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAAATGT TCCCCACCCC GAAACCCGTC CCCGCGCCCC 

101 AACAGGCGGC ACAAAAACAG GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTTAT 

201 TGAT GAAAAA AGTGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAACAAA CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTGAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC ACCCTCAACG 

401 GCGACACAGT CGAAGTCCGC CTGAGCGCGC CCGAAACCAA CGGACTGAAA 

4 51 ATCGACAAAG TCTATACCTT TACCAAAGAC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCgacTTgg acgACGATGC gaaaTccggc aaATccgagg 

701 ccgaatacaT CCGCAAAACC ccgaccggtt ggctcggcat gattgaacac 

751 cacttcatgt ccacctggat cctccAAcct aaaggcggcc aaaacgtttg 

801 cgcccaggga gactgccgta tcgacattaa aCgccgcaac gacaagctgt 

851 acagcgcaag cgtcagcgtg cctttaaccg ctatcccaac ccgggggcca 

901 aaaccgaaaa tggcggTCAA CCTGTATGCC GGTCCGCAAA CCACATCCGT 

951 TATCGCAAAC ATCGCcgacA ACCTGCAACT GGCAAAAGAC TACGGTAAAG 

1001 TACACTGGTT CGCATCGCCG CTCTTCTGGC TCCTGAACCA ACTGCACAAC 

1051 ATTATCGGCA ACTGGGGCTG GGCAATCGTC GTTTTGACCA TCATCGTCAA 

1101 AGCCGTACTG TATCCATTGA CCAACGcctC CtACCGTTCG ATGGCGAAAA 

1151 TGCGTGccgc cgcacCcaaA CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 

1201 GACCGTATGG CGCAACAGCA AGCGATGATG CAGCTTTACA AAgacgAGAA 

1251 AATCAACCCG CTGGGCGGCT GTctgcctat gctgttgCAA ATCCCCGTCT 

1301 TCATCGGCTT GTACTGGGCA TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 

1351 CCTTGGCTGG GCTGGATTAC CGACCTCAGC CGCGCCGACC CCTACTACAT 

14 01 CCTGCCCATC ATTATGGCGG CAACGATGTT CGCCCAAACC TATCTGAACC 

14 51 CGCCGCCGAC CGACCCGATG CAGGCGAAAA TGATGAAAAT CATGCCGTTG 

1501 GTTTTCTCCG TCATGTTCTT CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 

1551 GGTGGTCAAC AACCTCCTGA CCATCGCCCA GCAGTGGCAC ATCAACCGCA 

1601 GCATCGAAAA ACAACGCGCC CAAGGCGAAG TCGTTTCCTA A 

This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAI PTRGP 

301 KPKMAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 
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IIGNWGW AIV VLTIIVKAVL YPLT NAS YRS MAKMRAAAPK LQTIKEKYGD 

DRMAQQQAMM QLYKDEKINP LC-GCLP MLLQ IPVFIGLYWA LFA SVELRQA 

PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

VFSVMFFFFP AGLVLY WWN NLLTIAQQWH INRSIEKQRA QGEWS* 



5 ORF1 lng-1 and ORF1 1-1 shown 95. 1 % identity in 546 aa overlap: 



■ rf llng-l.pep MD FKRLTAFFAI ALV I M I GWE KM F ?T PK P V PAPQQAAQKQAATASAEAALAPAT P I T VTT 
.rf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 



orf llng-l.pep DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I : I Ill 

orf 11-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
70 80 90 100 110 120 



IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 
1111111111:1:1! I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I 
IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
130 140 150 160 170 180 



orfllng-l.pep 



orf 1 lng-1 .pep 



310 320 330 340 350 360 

orfllng-l.pep KPKI4AVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIV 
I : : : II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I : 
orf 11-1 KAEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAII 
300 310 320 330 340 350 



370 



400 



410 



420 



>rf llng-l.pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 



430 440 450 460 470 480 

orfllng-l.pep LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

50 II | I I I I | I I I I I I I I I I I 

orf 11-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

420 430 440 450 460 470 

490 500 510 520 530 540 

55 orf llng-1 .pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 

II I I I I I I I I I I I I I I I I I I Ill 

orf 11-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
480 490 500 510 520 530 

60 

orfllng-l.pep QGEWSX 

orfll-1 QGEWSX 
540 

65 In addition, ORF1 lng-1 shows significant homology with an inner-membrane protein from the 



database (accession number p25754): 
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60IM_PSEPU STANDARD; PRT; 560 AA. 

P25754; 

01-MAY-1992 (REL. 22, CREATED) 
01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 
01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
60 KD INNER-MEMBRANE PROTEIN. . . . 



orfllng-1 .pep 
p25754 



MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

MDIKRTILIAALAVVSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 



orfllng-1 .pep 
p25754 



AATAS AEAALAPAT PIT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 



orfllng-1 .pep 
p25754 



VLFGDGKEYTYVAQSELLDAQGNNILKGIG FSAPKKQYTL-NGD TVEVRLSAPE 



orfllng-1 .pep 
p25754 



TNGLKIDKVYTFTKDS YLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 

DNGVNYIKRFSFKRGEYDLNVSYLIDNQSGQAWNGNMFAQLKRDASGDPSSSTATGTATY 
180 190 200 210 220 230 



orfllng-1 . pep 



orfllng-1 . pep QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 
: 1 I :::::: I : : I : : : I : I I : : : Mill: I : : : : 

p25754 NNV VQTRKDSQGNYIIGYTGPVISVPA-GGKVETSALLYAGPKIQSKLKELSP 

290 300 310 320 330 

330 340 350 360 370 380 

orfllng-1. pep NLQLAKDYGKVHWF-ASPLFWLLNQLHNIIGNWGWAIVVLTIIVKAVLYPLTNASYRSMA 
: I : I : III : II I : I : I I I I : : : I : : : I I I I I : I : I I I : : : I : : : : I I : I I I I I I I 
p25754 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 

390 400 410 420 430 440 

orfllng-1 . pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 

p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 



orfllng-1. pep 



510 520 530 540 

orfllng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGEWSX 
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Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 Example 8 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 59>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

10 151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

251 ACCGTTACGA AGTT.TTTAT CGCGGTACG . ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

15 This corresponds to the amino acid sequence <SEQ ID 60; ORF13>: 



1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT X ALLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

251 ACCGTTACGA AGTTTTtTAT CGCGGTACGC ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVFY RGTHWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from ~N .meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORE (ORF13a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

orfl3.pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTX ALLSALGIXF 

orfl3a MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTA ALLSALGIWF 



VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 
VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
70 80 90 100 110 120 



LIVRKEGNLLIITHPX 
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The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

4 01 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 



1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 

10 20 30 40 50 60 

orfl3a.pep MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

I I I I I I I I I I I I I I I I I I I I I I 

orfl3-l AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 
10 20 30 40 50 

70 80 90 100 110 120 

or f 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

orfl3-l VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
60 70 80 90 100 110 

130 

orfl3a.pep LIVRKEGNLLIIAKPX 

Illllll::|l 

orfl3-l LIVRKEGNLLI ITHPX 

120 



Homology with a predicted ORF from N.sonorrhoeae 

ORF13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 
gonorrhoeae: 

orfl3 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 51 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl3ng MTWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 60 

orfl3 VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 111 

I I I I I I I I III Mi Mf M;:!llllll MINI =11111! 

orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 120 

orfl3 LIVRKEGNLLI ITHP 126 

I t::l 

orfl3ng LIVRKEGNLLI IANP 135 

The complete length ORF13ng nucleotide sequence <SEQ ID 65> is: 



1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

401 ACCCTTAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTG? VY1LWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1 : 

10 20 30 40 50 

orf 13-1 .pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl3ng MTWFVAAVAVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13-1. pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

III 111:1:1:1111: I Mill : 

orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 
70 80 90 100 110 120 

120 

orf 13-1. pep LIVRKEGNLLIITHPX 

I I I I I I I I I I I I :: I 
orfl3ng LIVRKEGNLLIIANPX 



Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF 13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from N. meningitidis and A '.gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

4 01 ATGCCGTC . . 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 



1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQE FEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

2 01 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

2 51 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

4 01 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

5 1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

10 Further work identified the corresponding gene in strain A of 'N. meningitidis <SEQ ID 71 >: 



ATGTTTGATT 
GATTGTCCTC 
GGCTCATCGG 
GACACGCAAA 
AGCTGCCGCT 
TGGAGGGTAA 
CTGCCCGAAC 
TCCCTTTCCC 
TGCCGTCCGA 
CAAACCGGCA 
GCGGGAATAC 
AAGTCAGCTA 
TCGCTGCGTA 
CCGCGCCAAA 



TCGGTTTGGG 
GGCCCCGAAC 
CAGGCTGCAA 
TCGAACTGGA 
GCTCAGGTTC 
TCTGCACGAC 
AGCGCACGCC 
GATGCGGCAA 
ACGTTCCTAC 
GTACAGCCGA 
CTGACTGCTT 
TATCGATACC 
AACAGGCAAT 
CCTAAATTGC 



CGAGCTGGTT 
GCCTGCCCGA 
CGCTTTGTCG 
AGAACTAAGG 
GAGACAGCCT 
ATTTCCGACG 
TGCTGATTTC 
ACACCCTATT 
GCTTCCGCCG 
ACCCGCGGAA 
CTGCCGCCGC 
GCTGTTGAAA 
AAGCCGCAAA 
GCGTCCGTAA 



TTTGTCGGCA 
GGCCGCCCGC 
GCAGCGTCAA 
AAGGCAAAGC 
CAAAGAAACC 
GTCTGAAGCC 
GGTGTCGATG 
AGACGGCATT 
AAACCCTTGG 
ACCGACCAAG 
ACCCGTCGTA 
CCCCTGTTCC 
CGCGATTTGC 
ATCATAA 



TTATCGCCCT 
ACCGCCGGAC 
ACAGGAATTT 
AGGAATTTGA 
GGTACGGATA 
TTGGGAAAAA 
AAAACGGCAA 
TCCGACGTTA 
GGACAGCGGG 
ACCGTGCATG 
CAGACCGTCG 
GCATACCACT 
GTCCTAAATC 



25 This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

30 201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with ORF2a: 

10 20 30 40 50 60 

orf 2 . pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 
35 ! I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 



KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRT PADFGVDENGN PXS 



orf 2 .pep 
orf2a 



RCGKHPIRRHFRRYAV 



50 The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

II I I I I I I I I I I I I I I I II Ill 

orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

55 orf2a.pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I : I 
orf 2-1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf 2a . pep DAANTLLDGI SDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 
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orf2-l DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

orf2a.pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 22 9 

I I I I I I I I I I I I I I I I I I I I I I I I : I I I I II! 

orf2-l QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 22 9 

Further work identified a partial DNA sequence <SEQ ID 73> in N. gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

101 GGCTTATCGG CAGGCTGCAA CGCTTT3TAG GAAGCGTCAA ACAAGAACTT 

151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

351 tCCCCttCCC gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

401 TGCCGTCTGA ACGTTCCGAT ACTtccgcCG AAACCCTTGG GGACGACAGG 

451 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tocgcaCacc 

601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

orf2 .pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

: I I II I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 

orf2ng MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf2 .pep KAKQEFEAAAAQVRDSLKETGTDM2GNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

1:11 I I I I I I II I I I I I I I I I I : : : I I I I : I I 

orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 120 

orf2.pep RCGKHPIRRH FRRYAV 136 

I III I II 

orf2ng RYGKHRIRRHFRRYAV 136 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 

10 20 30 40 50 60 

orf2-l.pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

orf2ng-l MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2-1 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

orf2ng-l KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
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orf 2-1 . pep DAANTLSDGISDVMPSER5YASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPVV 

orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 
130 140 150 160 170 180 

190 200 210 220 229 

orf 2-1 .pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 
I : I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orf2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 
190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of E.coli: 

gnl | PID | el292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score = 56.6 bits (134), Expect = le-07 

Identities = 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 

Query: 61 - KVKQAFE AAAAQ VR DSLKETDT DMQN S 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DS LKKVEKASLTNLT PELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
proteins and so the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
of expression of the His-fusion in E.coli. Purified GST- fusion protein was used to immunise mice, 
whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 77>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC.TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

251 ATTGATGCAC kGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

3 51 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

4 01 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 
4 51 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 

601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG. . 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

5 1 MQARLLIPIL FSVF ILSAC G TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTN PRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201. IDVFGTIRNR TEM. . 

1 0 Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG AC ACT GAC AG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

15 201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

2 51 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

20 451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC AC CT AT ACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

25 7 01 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

30 951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORP15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

35 151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

2 51 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ LD 81>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG AC ACT GAC AG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

45 2 51 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

50 501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

7 01 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 
55 751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

8 51 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 
901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 
951 AGGGCAACCT TGA 

60 This encodes a protein having amino acid sequence <SEQ ID 82; ORF15a>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 

301 SHEGYGYSDE AVRRHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 
overlap with ORF 15 a: 

10 20 30 40 50 60 

Orfl5.pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl5a MQARLLIPILFSVFILSA CGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 



KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 

130 140 150 160 170 180 

LTT5LSTLNAPAL3RTQSDG3GSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 



190 200 210 

orfl5.pep FLRGIDVVS PANADT D VFINIDVFGT IRNRTEM 

I I I I I I I I I I I I I I I I I I I 

orfl5a FLRGIDVVS PAN ADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 



MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 



KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 



190 200 210 220 230 240 

FLRGIDVVS PANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 
FLRGIDVVS PANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 



IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 



SHEGYGYSDEAVRRHRQGQPX 



SHEGYGYSDE WRQHRQGQPX 
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Further work identified the corresponding gene iniV. gonorrhoeae <SEQ ID 83>: 



ATGCGGGCAC 
CGCCTGCGGG 
TCGCGGTCGA 
GACATGGATT 
AACTATGGGC 
TTGATGCACT 
GATTACACCT 
TTTGACGGGT 
CGCGCACCCA 
ATTGGCGGGA 
CGACACTGCC 
GCATAGACGT 
ATCGACGTAT 
TGCCGAAACA 
GAACCAATAA 
GCCTATAAAG 
AGGAAT CAAA 
CATACGGCAA 
AGTCATGAGG 
AGGGCAACCT 



GGCTGCTGAT 
ACACTGACAG 
ACAAGAACTT 
TACAGGCATT 
GACCAAGGTT 
GATTCGCGGC 
ATCCGCGTTA 
TTAACCACTT 
ATCAGACGGT 
TGGGGGATTA 
TTTCTTTCCC 
TGTTTCTCCT 
TCGGAACGAT 
CTGAAAGCCC 
AAAATTGCTC 
AAAATTACGC 
CCGACGGAAG 
TCATACGGGT 
GGTATGGATA 
TGA 



ACCTATTCTT 
GTATTCCATC 
GTGGCCGCTT 
ACACGGACGA 
CAGGCAGTTT 
GAATACATAA 
CGAAACCACC 
CTTTATCTAC 
AGCGGAAGTA 
TCGAAATGAA 
ACTTGGTGCA 
GCCAATGCCG 
ACGCAACAGA 
AAACAAAACT 
ATCAAACCCA 
ATTGTGGATG 
GATTGATGGT 
AACTCCGCCC 
CAGCGATGAA 



TTTTCAGTTT 
GCATGGCGGA 
CTGCCAGAGC 
AAAGTTGCAT 
GACAGGGGGT 
ACAGCCCTGC 
GCTGAAACAA 
ACTTAATGCC 
GGAGCAGTCT 
ACCTTGACGA 
GACCGTATTT 
ATACAGATGT 
ACCGAAATGC 
GGAATATTTC 
AAACCAATGC 
GGGCCGTATA 
CGATTTCTCC 
CATCCGTAGA 
GCAGTGCGAC 



TTATTTTATC 
GGCAAACGCT 
TGCCGTTAAA 
TGTACATTGC 
CGCTACTCCA 
CGTCCGCACC 
CATCAGGCGG 
CCTGCACTCT 
GGGCTTAAAT 
CCAACCCGCG 
TTCCTGCGCG 
GTTTATTAAC 
ACCTATACAA 
GCAGTAGACA 
GTTTGAAGCT 
AAGTAAGCAA 
GATATCCAAC 
GGCTGATAAC 
AACATAGACA 



This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 

1 MRARLLIPIL FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDVVSP ANADTDVFIN 

201 IDVFGTTRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 213aa 
overlap with ORF15ng: 



orflSng 

orfl5ng 
orfl5.pep 
orfl5ng 
orf 15. pep 
orf 15ng 



MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
I : I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I I I I I I I I M I I I I I : I I I I I I I 1 I I I II I I I I I I I I I I I I I I I I I I I I I I I 
LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

FLRGI DWS PANADTDVFINI DVFGT IRNRTEM 



I I I I I I 



I I 



FLRGI DVVS PANADTDVFINI DVFGT I RNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 2 40 



The complete strain B sequence (ORF15-1) and ORF15ng show 98.8% identity in 320 aa overlap: 



10 20 30 40 50 60 

orf 15-1 . pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orfl5ng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 15-1 . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

orfl5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 



orfl5-l.pep 



130 140 150 160 170 180 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I 
LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
130 140 150 160 170 180 



FLRGIDVVSPANADTDVF] 



orf 15-1 . pep SHEGYGYSDEWRQHRQGQPX 

I I I I I I I I I I : I I M I I I ! I 
orfl5ng SHEGYGYSDEAVRQHRQGQPX 
310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
:s confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 11 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85>: 

1 . . GG . CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 
301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

4 01 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 
4 51 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 
501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC . TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 
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Further work revealed the complete nucleotide sequence <SEQ ID 87>: 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

4 51 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

7 01 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

7 51 Tc.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPVVLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical H.influenzae transmembrane protein HI0902 (accession number P44070) 
ORF17 and HI0902 proteins show 28% aa identity in 192 aa overlap: 



0RF17 


3 


HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF— FILFLTAVAFKTLHTDP 


59 






HK + + V + P ++ VF G F + +IF +++L ++ D 




HI0902 


72 


HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 


130 


ORF17 


60 


QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGG5LSVPFLIHCGFPAHKAIGTSSGLAWPI 


119 






Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 




HI0902 


131 


QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 


189 


ORF17 


120 


ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVXXXXXXXXXXXXXX 


179 






+SG S++++G +PE SLG++YLPAV ++A + + LG 




HI0902 


190 


GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 


249 


ORF17 


180 


FGIMLLLIAGKM 191 








F + L+++A M 




HI0902 


250 


FALFLIVVAINM 2 61 





Homology with a predicted ORF from N. meningitidis ("strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of AT. 
meningitidis: 



orfl7 .pep 
orfl7a 



GQHKKQAVNGKT VFTMMPGMI FGVFTGA FS 
I I I I I I I I : I I I I I I I I I I : I I I I : I I : I 
QGLAQHPYAQHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 



orf 17 .pep 
orfl7a 



AKYIP AFGLQIFFILFLTAVAF KT1HTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 
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GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 



AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
230 240 250 260 



The complete length ORF17a nucleotide sequence <SEQ ID 89> is: 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ID 90>: 

1 MWHWDIILIL LAVGSAAGF I A5 LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASP.PLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 S FGIMLLLIA GKMLYNLL * 

ORF17a and ORF17-1 show 98.9% identity in 268 aa overlap: 



MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPVVLWVLDLQGLAQHPYAQHLAVGTSF 



70 80 90 100 110 120 

orf 17a . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILELT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I II I I I I I I I 
orf 17-1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 17a . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 17a . pep IGTSSGLAWPIALSGAI SYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
orf 17-1 IGTSSGLAWPIALSGAI SYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



orf 17a. pep 



250 260 269 

HKLS S AKLKKS FG IMLLLI AGKMLYNLLX 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 



orfl7.pep GQHKKQAVNGKTVFTMMPGMIFGVFTGAFS 30 

I I : I I : I : I I I I I I I I I I: I I: I 

orfl7ng QGLAQHPYAQHLAVGTSFAVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALS 102 

orf 17 .pep AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

orfl7ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

orf 17 .pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 

orfl7ng GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

orf 17. pep AVLSAATIAFAPLGVKTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL 196 

Ml 111:11111 Ill 

orfl7ng AVLSAATIAFAPLGVKTAHKLSSAKLKES FGIMLLLIAGKMLYNLL 268 

An ORP17ng nucleotide sequence <SEQ LD 91> is predicted to encode a protein having amino acid 
sequence <SEQ ID 92>: 

1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPVVLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

4 51 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACT CAAAGAA 

7 51 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

8 01 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPVVLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF17-1 show 96.6% identity in 268 aa overlap: 

10 20 30 40 50 60 

orf 17-1 . pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orfn-l.pep AVMVFTAF3SMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 
I I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I 
orfl7ng-l AVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 17-1. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 1 I ! I I I I I I I I I I I I I I I I I I I I I I I I 
orfl7ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1. pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

orfl7ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
190 200 210 220 230 240 

250 260 269 

orf 17-1 .pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

orfl7ng-l HKLS S AKLKE S FG I MLLLI AGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical H.influenzae protein: 

sp|P44070 |Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi 1 1573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score = 74 (34.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGT S FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVF 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 



Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAI SYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEGSLGFLYLPAVAVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIVVAINM 261 

This analysis, including the homology with the hypothetical H.influenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

The following partial DNA sequence was identified in TV. meningitidis <SEQ ID 95>: 

1 . . GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

51 CGTCAGTAAT GTATCGATGA CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

151 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 
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TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 



This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence <SEQ ID 98; ORF18-l>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRA AP LFIPHFYLTL G SIFFFIGHW NRKTDGNGWQ ADPEHPLLGL 
101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKPVYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS E1G 
201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (ORF18a) from strain A of N. 
meningitidis: 

10 20 30 

orf 1 8 . pep GNGWQADPEHPLLGLFA V5NVSMTLAFVGI 
orfl8a TRAAP LFIPHFYLTLGSIFFFI GHWNRKTDGNGWQADPEHPLLGLFA VSNVSMTLAFVGI 



CALV HYCFSGTVQVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CALV HY CFSXTVQVFVFAALLKL YALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 



orf 18. pep QLRLG GLTAALMQVSVLVLLLS EIGRX 

I I I I ! I I I I I I I I I I I I I I I I I I I I I 
orf 18a QLRLG GLTAALMQXSVLVLLLS EIGRX 
180 190 200 

The complete length ORF 18a nucleotide sequence <SEQ ID 99> is: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 
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451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 
501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 
551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 
601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ID 100>: 



1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 

51 GIWGMTRA AP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

101 FA VSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA LK PVYWFVLQ 

151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 

201 R* " " """ 

ORF18a and ORF18-1 show 99.0% identity in 201 aa overlap: 

10 20 30 40 50 60 

orf 18a. pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

o r f 1 8 - 1 MILLHLDFLSALLYAAVFLFLI FRAGMLQWFWAS IMLWLGI S VLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl8a.pep LFIPHFYLTLGSIFFFIGHWNRKT DGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

I I I I 1 I I I I I I I I I I I I I I I I I I 

orfl8-l LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

Orfl8a.pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

III! Ill I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

orf 18-1 YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
130 140 150 160 170 180 

190 200 
orfl8a.pep gltaalmqxsvlvlllseigrx 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 18-1 GLTAALMQVSVLVLLLSEIGRX 
190 200 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from TV. 
gonorrhoeae: 



orf 18 .pep 
orf 18ng 
orf 18 .pep 
orf 18ng 
orf 18 .pep 
orfl8ng 



GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 

II I I I I I I I I I I I I I 

TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 

CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 

1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 

QLRLGGLTAALMQVSVLVLLLSEIGR 116 
11 I I I I : I I I I I : I : : I I : I I I I 
QLRLGVLAAMLMQVAVTAMLLAE IGR 201 



The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGt aTGCGGcggt 

51 tttTctgTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTGCGTT GTGGCTCGGC ATCTCGGTTT TAGGGGTAAA GCTGATGCCG 

151 GGGATGTGGG GAATGACCCG CGCCGCGCCT TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGTATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CATTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTATTGA TGGCGGttgC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GTTCGCAGCT GCGACTCGGC GTGTTGGCGG 
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551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AG AT G A 

This encodes a protein having amino acid sequence <SEQ ID 102>: 



1 MILLHLDFLS ALLYAAVFLF LIFRASMLQW TOASIALWLG ISVLGVKLMP 
51 GMWGMTRA AP LFIPHFYLTL GSI FFFI GYW NRKTDGNGWQ ADPEHPLLGL 
101 FAV SNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKP VYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLS VLAAMLMQVA VTAMLLA EIG 
201 R* 

This ORF18ng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1 : 



MILLHLDFLSALLYAAVFLFLI FRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I 

MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIALWLGISVLGVKLMPGMWGMTRAAP 



20 



40 



50 



60 



70 80 90 100 110 120 

LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

II I I I II I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I II I I I I 

LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 



VLAAMLMQVAVTAMLLAEIGRX 



Based on this analysis, including the presence of several putative transmembrane domains in the 
35 gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 13 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 103>: 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 



1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 

101 GAX. . . 

50 Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 
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2 01 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

4 51 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

7 01 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

17 51 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ID 106; ORF19-l>: 

1 MKTFLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPR7A KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQ3QAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNL NLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRI ID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of H. influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 

L +I++ + PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 6 6 VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISS FI VQLHI GKPI QYI VLMTVLT FI FTMI GA 101 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) from strain A ofN. 
meningitidis: 



irfl9.pep 
>rfl9a 



MKTPLLKPLLITSLPVFASVFT AASIVWQLGEPK LAMPFVLGIIAGGLVDL DNXXTGRLK 
MKTPPLKPLLITSLPVFASVFT AASIVWCLGEPK LAMPFVLGIIAGGLVDL DNRLTGRLK 



NIITTVALFTL5SLTAQSTLGTGLPF ILAMT LMTXXFTILGAX 



The complete length ORF 19a nucleotide sequence <SEQ ID 107> is: 



951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGAAAACCC 
CGCCAGTGTC 
AGCTCGCCAT 
TTGGACAACC 
CCTGTTCACC 
TGCCATTCAT 
GGCGCGGTCG 
CGCCACCTAC 
ACCCCTTTAT 
CTGTTCCAAA 
CGCCTACGAA 
ATCCCGACGA 
AGCAACACCG 
TTACCGCCTT 
GCTACTACTT 
GTCGACTACC 
CCGCATCCAC 
CCCAAGCCCT 
CGCGCCATCG 
CGACAATCCC 
GCGTCGACCA 
AACGACCGCA 
CAAAAACACC 
TATTCCGCCA 
ATCGTCGAAG 
CCTTTTCGTC 
AGCGCATCGC 
TACTTTACCC 
CACCCTCTTT 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATCACCG 
CCGCGCCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 



CACCCCTCAA 
TTTACCGCCG 
GCCCTTCGTA 
GCCTGACCGG 
CTCTCCTCAC 
CCTCGCCATG 
GGCTGAAATA 
ACCACACTTA 
GATTCTGTGC 
TCATCCTGCC 
GCACTCGGCA 
AGCCGAATGG 
GCGTCATCAC 
CGCGGCAAAC 
CGCCGCCCAA 
AAGAGATGTC 
CGCCTGCTCG 
GCGCGCAAGC 
AAGGCTGCCG 
GACATCCGCC 
GCAGTTCCGC 
TGGGCGACAC 
TGGCAGGCAA 
TGCCGTCCGC 
CCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAAGCCCTG 
TGCCCGTACG 
GCAGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTC 
AAACCGAACC 
CTCGACACCC 
CCAACAGCTC 
ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GGTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 
CTGGCAGCTG 
TCGCTGGCGG 
AACATCATCG 
AAGCACCCTC 
CTTTCGGCTT 
GCCTTCGGCG 
CGAAACCTAC 
TGTACAGCAC 
GTTCAAGAAA 
AGCCAAAGCC 
GCCACATCGA 
CAATGCCGTT 
GCGCACCGCC 
AACGCATCAG 
AAAAACACCG 
ACAAGCCTGC 
TTTACAGCAA 
CGCCTCCTTT 
CCTTCTCGAC 
ACAACGGCCT 
GCCCTCGAAA 
GCTAAACCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATCGTCA 
ATACAGCTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAACGGCGC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACAGCAC 
CAGCAGCGGA 
CCCGGCAGCT 
CAGCCCCAAA 



TTCCCGTTTT 
GGCGAACCCA 
CCTGGTCGAT 
CCACCGTCGC 
GGCACAGGTT 
TACCATCATG 
CACTCGCCGT 
TGGCTGACCA 
CGCCATCATC 
ACGTCGCCAA 
GACTTTTTCG 
CCTCGCCATG 
CCGCCCTGTT 
AAAATGCTGC 
CTCCGCCCAC 
ACATCATCTT 
CGCAACACCG 
ACGCCTCGGC 
CAGACAGCAA 
AACCTCGGCA 
GCAGGCAGAA 
CCGGCAGCCT 
GAATCAGGCG 
CGCCTGCACC 
TACTGACCGC 
CGCGTCCGCC 
GCTCGTCCCC 
TCGCCAGTAC 
TCGACATTTT 
GTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
CTATCTCGAA 
ACGTCGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 



WO 99/24578 



-111- 



PCT/IB98/01665 



This encodes a protein having amino acid sequence <SEQ ID 108>: 



MKTPPLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNL NLG 
YFTPSVETKL 
YAAMPVRIID 
KITERLKSGE 
PGFTLLKTGY 
HLPETEPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQENVANAYE 
QCRSALFYRL 
KNTDIIFRIH 
RLLSDSNDNP 
ALETGSLKNT 
YWILLTALFV 
WIVIASTTLF 
TIIGASLAWA 
TGDDVEYRAT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL 
LSSLVAQSTL 
TTLTYTPETY 
ALGSYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLRRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYR5EMHEEC 
LDTLRTHSSG 



GEPK LAMPFV LGIIAGGLVD 



GTGLPF ILAM 
WLTNP FMILC 
DFFDPDEAEW 
KMLRYYFAAQ 
RNTAQALRAS 
NLGSVDQQFR 
ESGVFRHAVR 
RVRQR IAGTV 
STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TLMTFGFTIM 
GTVLYSTAII 
IGNRHIDLAM 
DIHERISSAH 
KDYVYSKRLG 
QLQHNGLQAE 
LSLWAAACT 
LGVIVGSLVP 



TSLSLAGLDV 
AVCSNGAYLE 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 



orfl9a.pep 
orfl9-l 



' MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

I I I I I I I I I I i I I I I I I I I I I I I I I 

MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 



orfl9a.pep 
orfl9-l 



NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 
NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFi 



orfl9a.pep ttltytpetykltnpfmilcgtvlystaiilfqiilphrpvqenvanayealgsyleaka 
I I I I I I I I I I I 11111111:1111: I : I I I I I : I I I : I I I I I I 

orfl9-l TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
130 140 150 160 170 180 



orfl9a.pep 
orfl9-l 



DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 



orfl9a.pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
I I I I I I I I I I I I I I I I I I I I I i I I I I I I I 1 I I I I I I I I I I I I I 1 I I I I I I ! I I I I I I I I I 
orfl9-l DIHERISSAHVDYQEMSEKFKNTDI I FRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 



rfl9a.pep 
rfl9-l 



RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 



orfl9a.pep 
orfl9-l 



ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 



orf 19a. pep 
orfl9-l 



CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
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550 560 570 580 590 600 

AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I 
AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

PGFTLLKTGYALTGYI SALGAYR3EMHEEC3PDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 . 650 660 

670 680 690 700 710 

QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 



Homology with a predicted QRF from N.eonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 
gonorrhoeae: 

orfl9.pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orfl9.pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGAX 103 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill! 
orfl9ng NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 110>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIIATVA LFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAII 

151 LFQIILPHRP VQESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRH PR7A KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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1001 GCGTcgacca gcagtTCcgc 

1051 Aacgaccgca tgggcgacaC 

1101 caaaaaCAcc tggcaggCAA 

1151 TATTCCGCCA TGCCGTCCGC 

12 01 ATCGTCgaag CCCTCAACCT 

12 51 CCTTTTCGTC TGCCAACCCA 

1301 AACGCATCGC CGGCACCGTA 

1351 TACTTCACCC CCTCCGTCGA 

14 01 CACCCTGTTC TTCATGACCC 

1451 T CATC AC CAT TCAGGCACTG 

1501 TACGCCGCCA TGCCCGTGCG 

1551 TGCCTGGGCG GCGGTCAGCT 

1601 TCGAACGCAC CGCCGCCCTT 

1651 AAAATTGCCG AACGCCTCAA 

17 01 CCGCATCACC CGCCGCCGCG 

17 51 CCCTTTCCGA CATGAGCAGC 

18 01 CCCGGCTTTA CCCTGCTCAA 
1851 CGCCCTCGGC GCATACCGCA 
1901 TTACCGCACA GTTCCACCTT 
1951 CACCTGCCCG ACATGGGACC 
2001 GCGCGGCGAA CTCGGCACCC 
2051 ACATCCTCCT CCAACAGCTC 
2101 TACCGCGCCT ACCGACAAAT 
2151 A 

This corresponds to the amino acid seque 



caactCCGAC ACAgcgactC CCCCGCcgaa 
CCGCATCGCC GCCCtcgaaa ccggcagctT 
TCCGTCCGCa gctgaaCCTC GAATCatgCG 
CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 
CAACCTCGGC TACTGGATAC TGCTGACCGC 
ACTACACCGC CACCAAAAGC CGCGTGTACC 
CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 
AACCAAACTC TGGATTGTCA TCGCCGGTAC 
GCACCTACAA ATACAGTTTC TCCACCTTCT 
ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 
CATCATcgaC ACCATTATCG GCGCATCCCT 
ACCTGTGGCC AGACTGGAAA TACCTCACGC 
GCCGTATGCA GCAGCGGCAC ATACCTCCAA 
AACCGGCGAA ACCGGCGACG ACATAGAATA 
CCCACGAACA CACCGCCGCC CTCAGCAGCA 
GAACCCGCAA AATTCGCCGA CAGCCTGCAA 
AACCGGCTAC GCCCTGACCG GCTACATCTC 
GCGAAATGCA CGAAGAATGC AGCCCCGACT 
GCCGCCGAAC ACACCGCCCA CATCTTCCAA 
CGACGACTTT CAGACGGCAT TGGATACACT 
TCCGCACCCG CAGCAGCGGA ACACAAAGCC 
CAACTCATCG CccgGCAACT CGAACCCTAC 
TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 



<SEQ ID 1 12; ORF19ng-l>: 



1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQESVANAYE ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

2 51 VDYOEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT WQAIRPQLNL ESCVFRHAVR LSLWAAACT 

4 01 IVEALNL NLG YWILLTALFV CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 

4 51 YFTPSVETKL WIVIAGTTLF FMTRTYKYSF STFFITIQAL TSLSI.AGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 

551 KIAERLKTGE TGDDIEYRIT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPDMGPDDF QTALDTLRGE LGTLRTRSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA+ 

ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 

10 20 30 40 50 60 

orf 19-1. pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 19-1. pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMT FGFTILGAVGLKYRTFAFGALAVATY 
I I I : I I I I I I II ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19-1. pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

III I M II : I I I I : I I I I I I I I I I I I I I : I I I I I 

orfl9ng-l TTLT YTPET YWLTNP FMI LCGT VLYS TAI I LFQI I LPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 19-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l DFFDPDEAAWIGNRKIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 



orfl9-l.pep 



250 260 270 280 290 300 

DIHERISSAHVDYQEMSEKFKKTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
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DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 
250 260 270 280 290 300 



orfl9-l.pep 
orf 19ng-l 



RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 



orf 19-1 .pei 
orfl9ng-l 



ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 



orfl9-l.pep 
orfl9ng-l 



CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 



orfl9-l .pep 
orfl9ng-l 



STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 



orf 19-1 .pep 
orf 19ng-l 



orf 19-1 .pep 
orf 19ng-l 



PGFTLLKTGYALTGYi: 



orfl9-l.pep 
orf 19ng-l 



QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I I 11 I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
670 680 690 700 710 



In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp|033369|YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PID | ell54 438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length = 417 
Score = 1512 (705.6 bits), Expect = 5.3e-203, P = 5.3e-203 
Identities = 301/326 (92%), Positives = 306/326 (93%) 



RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 



ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWI VIAGTTLFFMTRTYKYSFSTFFIT 



IQALTSLSLAGLDVYAAKPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 



Query: 


307 


Sbjct: 


1 


Query: 


367 


Sbjct: 


61 




427 


Sbjct: 


121 




487 


Sbjct: 


181 
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Query: 54 7 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 60 6 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTGYISALGHTAAKCTKNAA? 32 5 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
113>: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

4 51 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC . GTTTC 

501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 

551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

7 01 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

7 51 CACGATTTTc GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCaj|TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG.CAAG GGTTGG3CAG CGTTCTT .AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 1 14; ORF20>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL AP3FYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 



1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 



WO 99/24578 



-116- 



PCT/IB98/01665 



101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

4 51 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

7 01 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

7 51 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

8 01 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 
851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 
901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 
951 GACGCTGCCG GCGGCGGTCG GACTGGGGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTGGGT TTAATCGGCT TAATCATGAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

14 01 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRTT FPYTLL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPPVTALA WAVFV GGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHA LIAYSFG LIGLIMIKVL APGFY ARQN I KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

4 51 SLAVMCGGL W AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor of S. typhimurium (accession number P37169) 

ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Orf20 61 AQAFVPILAEYKETRSKEAXEAFIRHVAGMLS FVLVIVTALGILAAPWVIYVSAPSFAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
MviN 7 4 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

Orf20 121 ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVS FIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
MviN 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMG PAILGV 253 

Orf2 0 241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L S GS VSWMY YADR+ME PSGVLG ALGTILLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Orf20 301 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ ALIAYS G 
MviN 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Orf20 3 61 LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXXXXXXXXXXXXXXXXXCI 420 

LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F C+ 
MviN 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Orf20 421 NAGLLFYLLRRHG I YQPXQG 440 

NA LL++ LR+ 1+ P G 
MviN 434 NAS LL YWQLRKQN I FT PQPG 453 



Homology with a predicted QRF from N. meningitidis f strain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 20 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

or f 20a MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGRF 
10 20 30 40 50 60 



70 80 90 100 110 120 

or f 20 . pep AQAFVPI LAE YKETRSKEAXEAFIRHVAG MLS FVLVI VTALG I LAA PWVI YVSAPS FAQD 

III I I I I I I I I I I : I I II I I I I I I I I I I I I I I I : i I : I 

orf 20a AQAFVPILAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPGFAKD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20. pep ADKFQLSIDLLRIT FPYILLISLSSFVGSVL NSYKKFGIPAFTPX FLNVSFIVFALFFVP 

orf20a ADKFQLSIDLLRIT FPYILLISLSSF' V- iVT. N T-FFSIPAFTPT FLNVSFIVFALFFVP 

130 140 150 160 170 180 



190 200 210 220 230 240 

YFDPP VTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I 

YFDP P VTALAWAVFVGGILQLG FQL PWLAKLG FLKLPKLS FKDAAVNRVMKQ MAPAI LGV 

190 200 210 220 230 240 



310 320 330 340 350 360 

or f 20 . pep EQFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQHA LIAYSFG 

orf 20a EQFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQHA LIAYSFG 
310 320 330 340 350 360 



370 380 390 400 410 420 

orf 20 .pep LIGLIMIKVL APGFYARQNIXXPVK IAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 

orf 20a LIGLIMIKVL APGFYARQNIKTPVK IAIFTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 
370 380 390 400 410 420 



orf20.p< 
orf20a 



The complete length ORF20a nucleotide sequence <SEQ ID 117> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 
101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
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151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

4 51 CTCAATTCCT AT CAT AAAT T CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

7 51 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CAACCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATCATGAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

14 01 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 1 1 8>: 

1 MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 I.GFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAFAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR XCMLLTLP AAVGKAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHA LIAYSFG LIGLIMIKVL APGFYARQKI KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

4 51 SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLA AL 

501 GFRPRHFKRV ES* 

ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 

10 20 30 40 50 60 

MNMLGALVKVGSLTWVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

: I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
10 20 30 40 50 60 

70 80 90 100 110 120 

AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVI YVSAPGFAKD 

II I I I I I I I I I I I : II I I I I I I II I II II I I I I I I I I I I I : I 

AQAFVPILAEYKETRSKEAA3AFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFS I PAFTPTFLNVS FIVFALFFVP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 11 Ill 

ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVS FIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 



orf 20a . pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
orf20-l 

orf20a.pep 
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EQFSALLDWGLRXCMLLTLPAAVGMAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
310 320 330 340 350 360 



Homology with a predicted QRF from N. gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 
gonorrhoeae: 

orf20.pep MNMLGALAKVGSLTMVSRVLGEVRDTVIARAFGAGMATDAFEVAFKLPNLLRRVFAEGAF 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I 
orf20ng MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

orf2 0 .pep AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 

O r f 2 Ong AQAFVPILAEYKETRSKEATEAFI RHVAGMLS FVLI WTALGI LAAPWVI YVSAPGFTKD 120 

orf 20 .pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I : i 11 I 1 I I I I I I I I I : I I I : I I I I I I I I ! I I 

orf20ng ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

orf 20 .pep YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

orf20ng YFDP P VTALAWAVFVGGI LQLG FQL PWLAKLG FLKL PKLN FKDAAWRVMKQMAPAI LGV 240 

orf 20. pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

orf20ng SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 300 

orf 20 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 360 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I INI 

orf20ng EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 

orf 20 .pep LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 420 



LIGLIMIKVLASGFYARQNIKTPVKIAI FTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 42 0 



454 
454 



NAGLLFYLLRRHGIYQPXQGLGSVLXQKCCSRSP 
I I I I I I : I : I : I I I I : I III: : I I I I I I I 
NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 454 

An ORF20ng nucleotide sequence <SEQ ID 1 1 9> was predicted to encode a protein having amino 
acid sequence <SEQ ID 120>: 



orf 20. pep 
orf20ng 
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1 MNMLGALAKV GSLTMV3RVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL SLAIGLGACI NAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 

451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ED 121>: 

1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtccg CgcccGGCTT 

351 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

401 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 

4 51 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 

501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 

601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT GgttATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 

801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 

851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

1201 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 

1251 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 

1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

1351 GCGCTCGCCG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

1401 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAA PWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 

4 51 ALAVMCGGL W AAQACLPFSW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 

501 GFRPRHFKRV ES* 

ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 

10 20 30 40 50 60 

or f 20-1 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

orf20ng-l MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



AQAFVPILAE YKETRSKEAAEAFIRHVAGKLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
I I I I I I I I I I I I I I I I i I I : I I I I I I I I I I I I I I I : : I I I I I I I I I I I I I I I I I I I I : : I 
AQAFVPILAE YKETRSKSATEAFIRHVAGMLSFVLIWTALGILAAPWVI YVSAPGFTKD 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20-1 . pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 

orf20ng-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 
130 140 150 160 170 180 



orf 20-1 . pep 
orf20ng-l 



YFDPPVTALAWAVFVGGILQLGFQ1FWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 



orf20-l.pep 
orf20ng-l 



SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 



orf 20-1 .pep 
orf20ng-l 



EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 



orf 20-1. pep 
orf20ng-l 



LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 



orf 2 0-1 . pep NAGLLFYLLRRKGI YQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
I I I I I I : I I I : I I I I : I I : I I I I I I I I I I I : I I I II I I I I I I I I I I I I I I I I I I I I I I I 
orf20ng-l NAGLLFFLLRKKGIYRPGRGWAAFLAKMLLALAVMCGCLWAAQACLPFEWAHAGGMRKAG 
430 440 450 460 470 480 



40 orf 20-1. pep QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
orf20ng-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 
490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S.typhimurium: 

45 sp|P37169|MVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 

typhimurium gi 1 438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl|PID|dl005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 

Score = 1573 (750.1 bits), Expect = l.le-220, Sum P(2) = 1 . le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

50 



Query: 


1 


MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 


60 






MN+L +LA V S+TM SRVLGF RD ++AR FG AGMAT D A FFVA FKL PN L LRR+ FAE G AF 




Sb j ct : 


14 


MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 


73 


Query: 


61 


AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 


120 






+QAFVP I LAE YK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 




Sbjct: 


74 


SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAVVTVAGMLAAPWVIMVTAPGFADT 


133 


Query: 


121 


ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 


180 






ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 




Sbjct: 


134 


ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 


193 


Query: 


181 


YFDP PVTALAWAVFVGGI LQLGFQLPWLAKLG FLKL PKLN FKDAAVNRVMKQMAPAI LGV 


240 






YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 




Sbjct: 


194 


YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMGPAILGV 


253 


Query: 


241 


SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 


300 






SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 




Sbjct: 


254 


SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 


313 
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301 


EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYRE FTLFDAQMTQHALIAYSFG 


360 






+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 




Sbjct: 




DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 


373 




361 


LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 


420 




LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 




Sbjct: 


374 


LIGLIVVKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 


433 


Query: 


421 


NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 








NA LL++ LRK 1+ P GW VM L+ +P 




Sbjct: 


434 


NAS LL YWQLRKQN I FT PQPGWMW FLMRL 1 1 S VLVMAAVLFGVLH IMP 480 




Score 


= 70 


(33.4 bits), Expect = l.le-220, Sum P(2) - l.le-220 




Identities : 


= 14/41 (34%), Positives = 23/41 (56%) 




Query: 


469 


EWAHAGGMRKAGQLCILIAVGGGLYFASLAALGFRPRHFKR 509 








EW+ + + +L ++ G YFA+LA LGF+ + F R 




Sbjct: 


481 


EW S QG S MLWRLLRLMA W I AGI AAY FAALAVLG FKVKE FVR 521 





Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 15 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG . . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

7 01 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 
751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

8 01 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 
8 51 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 
901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAAT ACT C CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of TV. meningitidis <SEQ ID 127>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGVVFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

4 01 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

orf22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
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orf 22 . pep KKNPGVVFTAPASGKIAAIHRGEKRVLQSVVIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 22a KKXPGWFTAPVSGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGXEXXX 



orf 22 .pep NLIQSGLWTALRTRPF3KIPAVDAEPFAIFVKAMDTN? 

orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22a. pep MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 
orf 22-1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 



KKXPGVVFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
II I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYAPEALANLSGEEVRR 



NL I QSGLWTALRXRPFSKI PAV D AE P FA I FVN AM DTN P L AAD P V W I KE AXX D FRRXXL V 
I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I II: II 
NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 



orf 22a. pep 
orf22-l 



SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 



orf 22a. pep 
orf22-l 



LFKFTTAVNGGDRAMVPIGTYERVMFLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 



Further work identified a partial gene sequence <SEQ ID 129> from N. gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKGQVLFED KKN PGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWT I NYQDVIAIGR 

251 LFVTGRLNTE RVVALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene <SEQ ID 131>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

201 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 

351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWT I NYQDVIAIGR 

251 LFVTGRLNTE RVVALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 



orf22 .pep 


MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : : 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 : I 1 1 1 1 1 1 : 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 
MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 


60 


orf22ng 


60 


orf 22 .pep 


KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEXNDEIEFERYAPEALANLSGEEVRR 
KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 


120 


orf 22ng 


120 


orf22.pep 


NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 


158 


orf22ng 


NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 


180 



The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 



10 20 30 40 50 60 

orf 22-1 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
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orf22ng-l MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22-1 . pep KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

orf22ng-l KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22-1. pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

orf22ng-l NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1. pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

Orf22ng-l LSRLTERKIHVCKAAGADVP3ENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22-1. pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

orf 22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 
250 260 270 280 290 300 

310 320 330 340 350 360 

Orf 22-1 . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22-1. pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

430 440 
orf 22-1. pep LCSFVCPGKYEYGPLLRKVLETIEKEGX. 

orf22ng-l LCSFVCPGKYEYGPLLRKVLETIEKEGX 
430 440 

Computer analysis of these sequences gave the following results: 

Homology with 48kDa outer membrane protein of ActinobaciUus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf22 1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
48kDa 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

orf22 61 KKNPG WFTAPAS GKI AAI HRGEKRVLQS WIAVEXN DE I E FERYAPEALANLS GEEVRR 120 

KKNPGWFTAPASG + I +RGEKRVLQS WI VE +++I F RY LA+LS E+V++ 
48kDa 61 KKNPGWFTAPASGTVVTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

orf 22 121 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNP 158 



ORF22a also shows homology to the 48kDa ActinobaciUus pleuropneumoniae protein: 

gi|1185395 (U24492) 48 kDa outer membrane protein [ActinobaciUus pleuropneumoniae] 
Length = 44 9 



Score = 530 bits (1351), Expect = e-150 
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Query: 


1 


Sbjct: 




Query: 




Sbjct: 




Query: 




Sbjct: 


121 


Query: 




Sbjct: 


181 


Query: 


238 


Sbj ct : 


241 




298 


Sbjct: 


301 


Query: 


358 


Sbjct: 


361 




418 



; = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 

MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE 



+++CK A +++P I 



W +NYQDVIAIG+LF TG L T+R+I+! 



rvisgsvl+ga 
rvisgsvlsgataagpvdyll 

knklfkfttavnggdramvp: 
k klf fttav+gg+ramvp: 
k-klfnfttavhggeramvp: 

XXXXXSFVCPGKYEXGPLLR1 

++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
pleuropneumoniae] Length = 4 49 
Score = 555 bits (1414), Expect = e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 



KKNPGWFTAPASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 



NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNPLAADP V++KE 





27 


Sbjct: 


1 


Query: 


87 


Sbjct: 


61 


Query: 


147 


Sbjct: 


121 




207 


Sbjct: 


181 


Query: 


264 


Sbjct: 


241 


Query: 


324 


Sbjct: 


301 


Query: 


384 


Sbjct: 


361 


Query: 


444 


Sbjct: 


420 



++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 



W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL 



RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 



K KLF FTTAV+GG+RAMVPIG YERVM 



++VCPGK YGP+LR LE IEKEG 
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Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST- fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC.nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

401 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA AT C GAT GAG T 

451 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCmmmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 



1 . .AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 



1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CGGCCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGG 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CGATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



1 MSQTDTQRDG RFLRTVEWLG 

51 VPDPRPVGAK GRADDG LIYI 

101 VSLLGVGIA E KSGLISALMR 

151 WLIPLSAII FHSL GRHPLA 

201 QQAAQIIHPD Y VVG PEANW F_ 

251 DLSQEEKDIR HSNEITPLEY 

301 PETGLVSGSP FLKS IWFIF 

351 MST LGLYLVI IFFAAQFVA: 

401 GFILICAFIN LMI GSASAQW 

451 VTN IITPMMS YFGLIMATVI 

501 CIWVFVL GLP VGPGAPTFYP 



LLLTKSPRKL TTFMWFTGI 

GLAAAFAGVS GGYSANLFLG 

FMVASTFVIA LIGYFV TEKI 

KGLIW AGWF VALSALLAWS 

LLFALPGIVY G RVTRSLRGE 

FNWTNIGQYI AVKGATFLKE 

AVTAPIFVPM LMLAGYA PEV 

KYKKDAGVGT LISMMLPYSA 
AP* 



LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEVVNAMAES 
VGLGGS VLFI 
IOAAYRIGDS 
FFLIAWIALF 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of N. 
meningitidis: 

10 20 30 

orf 12 .pep AXXIIHPXXVVGPEANWFFMVASTFVIALI 



GYFVTEKIVE PQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

GYFVTEKIVE PQLGPYQ3DLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSAL LAWS IV 
240 250 260 270 280 290 

100 110 120 130 140 150 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMS 
300 310 320 330 340 350 

160 170 180 190 200 210 

TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 
360 370 380 390 400 410 



orf 12 -pep 



220 230 240 250 260 270 

IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
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280 290 300 310 320 

orf 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGCCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCTGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCTCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

7 51 GATTTGTCAC AAGAAGAAAA AGACATTCGA CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CAATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 138>: 

1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAA GAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TGFAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YVVGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGS VLFI 

4 01 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI XYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGAPTFYP AP* 

ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 

10 20 30 40 50 60 

orf 12a. pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orf 12-1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 



GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I I I I I :: I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I ! I I I I I I I I I I I I 
GRADDGLIYIVSLLKADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orfl2a.pep LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12-1 LLLTKSPRKLTTFWVFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
130 140 150 160 170 180 

190 200 210 220 230 240 

GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

GGYSANLFLGTIDPLLAGITQQAAQI IHPDYVVGPE7ANWFFMVASTFVIALIGYFVTEKI 
190 200 210 220 230 240 

250 260 270 280 290 300 

VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VEPQLGPYQSDLSQEEXDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
250 260 270 280 290 300 

310 320 330 340 350 360 

PETGLVSG5PFLKSI WFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

PETGLVSGSPFLKSIWFI FLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
310 320 330 340 350 360 

370 380 390 400 410 420 

IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 

430 440 450 460 470 480 

AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
430 440 450 460 470 480 

490 500 510 520 

orf 12a. pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

orf 12-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
490 500 510 520 

Homology with a predicted ORF from N. gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
gonorrhoeae: 



orfl2.pep 


AXXIIHPXXVVGPEANWFFMVASTFVIALI 


30 


orfl2ng 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 ! 1 1 1 ! 1 1 
AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMAASTFVIALI 


232 


orf 12. pep 


GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIV 


90 


orfl2ng 


292 


orf 12 .pep 


PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

1 1 ' 1 1 1 ! : 1 1 1 1 1 M 1 1 ! ! 1 M 1 1 1 1 1 1 :|||||||:|| 

PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 


150 


orf 12ng 


352 


orf 12 .pep 


TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 


210 


orfl2ng 


II 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II : 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 
TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 


412 


orf 12 .pep 


IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 


270 


orfl2ng 


IGSASAQWAVTAPIFVPMLM^ 


472 



orfl2a.pep 
orfl2-l 



orf 12a. pep 
orfl2-l 



orf 12a .pep 
orf!2-l 



orf 12a .pep 
orfl2-l 



orfl2a.pep 
orfl2-l 
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orfl2.pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 139> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGT3T GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

1401 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 

1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 

1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASELGY 

151 VVLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YVVGPEANWF FMAASTFVTA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGVVF VALSALLAWS IV PADGILRH 

301 PETGLVAGSP FLKS IWFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

3 51 MST LGLYLVI I FFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGS VLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1: 

10 20 30 40 50 60 

MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
I I I I I : : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i 
MSQTDARRSGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

GRADDGLIHVVSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
70 80 90 100 110 120 

130 140 150 160 170 180 

LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I I I I I I I ] I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 
LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVS 



orf 12-1. pep 
orf 12ng 

orf 12-1 .pep 
orf 12ng 

orfl2-l.pep 
orfl2ng 
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130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12-1. pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMVASTFVIALIGYFVTEKI 
5 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 

orfl2ng GGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMAASTFVIALIGYFVTEKI 

190 200 210 220 230 240 



orf 12-1 .pep 



370 380 390 400 410 420 

IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 



430 



440 



450 



460 



470 



AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 



orf 12-1 .pep 



LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I 
LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 



In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

sp|P46133|YDAH_EC0LI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gl [ 1787597 (AE000231) hypothetical protein in ogt 5 ' region [Escherichia coli] 
Length =510 
Score = 329 bits (835), Expect = 2e-89 

Identities = 178/507 (35%), Positives = 281/507 (55%), Gaps = 15/507 (2%) 

RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P D 
QSGKLYGWVERIGNKVPHPFLLFI YLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 
+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

WVKNLLSVEGLHWFLPNVIKNFSGFAPLGAILALVLGAGLAERVGLLPALMVKMASHVN 124 



V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 



45 


Query: 


8 




Sbjct: 


13 


50 




68 








Sbjct: 


65 




Query: 


128 


55 


Sbjct: 


125 




Query: 


188 


60 


Sbjct: 


185 








248 




Sbjct: 


245 


65 


Query: 


308 




Sbjct: 


299 



r T D LL+GI+ +AA 



NW+FMA+S V+ ++G +T+KI+EP+LG 



YQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRHPETGLVA 307 
+Q + 4-+ + + S GL AGW + A +A ++P +GILR P V 

WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 298 

308 GSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLXXXXXXXXX 367 

SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 
299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 



70 



Query: 368 XXXXNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQWAVTAPIF 427 
NW+N+G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 
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Sbjct: 


359 


Query: 


428 


Sbjct: 


419 




488 


Sbjct: 


479 



359 VAMFNWSNMGKFIAVGLT DILESSGLSGI PAFVGLALLS S FLCMFIASGSAIWS ILAPI F 4 1 8 

VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 4 87 
VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 



Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N .meningitidis and N .gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 17 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 



1 


. . ACAGCCGGCG 


CAGCAGGTTn 


CnCGGTCTTC 


GTTTTCGTAA 


CGGACAGTCA 


51 


GGTGGAGGTG 


TTCGGGAACA 


TCCAGACCGC 


AGTGGAAACA 


GGTTTTTTTC 


101 


ATGGCATTTC 


GGTTTCGTCT 


GTGTTTGGTG 


CGGCGGCACA 


AGACTCGGCA 


151 


ATgGCTTCGC 


GCAGTGCGTC 


TATACCGGTA 


TTTTCAGCAA 


CGGAAATGCG 


201 


GACGGcGgCA 


ATTTTTCCCG 


CAGCGTCGCG 


CCATATGCCC 


GTGTTTTgTT 


251 


CTTCAGACGG 


CAGCAGGTCG 


GTTTTGTTGT 


ACACCTTgAT 


GCACGGAaTA 


301 


TCGCCGGCAT 


GGATTTCTTG 


CAGTACGTTT 


TCCACGTCTT 


CAATCTGCTG 


351 


TCCGCTGTTC 


GGAGCGGCGG 


CATCGACGAC 


GTGCAGCAGC 


ACATCgGcTT 


401 


gCGCGGTTTC 


TTCCAGCGTG 


GCgGAAAAGG 


CGGAAATCAG 


TTTgTGCGGC 


451 


agATyGCTnA 


CGAATCCGAC 


GGTATCGGTC 


AGGATAATGC 


TGCATTCGGG 


501 


ACT. . 











This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 . . TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

30 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from ~N .meningitidis (strain A) 
35 ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
meningitidis: 

10 20 30 

orf 14 .pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 

1:1111 I I I I I I I : I : : I I I I : I MM 

40 orfl4a 



orf 14. pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 

M M M M M M M M M I I I M M M M 

orf 14a GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
210 220 230 240 250 260 

100 110 120 130 140 150 

orf 14. pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

M M M M M M M M I M M M M M M M M I M M I 

orf 14a VLLYTLMHGISPAWISCSTFSTS3ICCPLFGAAASTTCSSTSACAVSSSVAE?CAEISLCG 
270 280 290 300 310 320 
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160 

orf 1 4 . pep RXLTNPTVSVRIMLHSG 

orfl4a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF 14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

4 01 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA ?CAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 



1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAV I E V DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSST3 ACAVS3SVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

It should be noted that this sequence includes a stop codon at position 118. 



Homology with a predicted ORF from N. gonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 



orf 14. pep TAGAAGXXVFV FVT D SQVE V FGN I QT AVE T 30 

orfl4ng GRQFGFFRVGGASFVITAQAGIDDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 208 

orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

I ! I I I I I I I I I I I I I I I It I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 4 ng GFFHGI SVS SVFGAAAQYSAMASRSAS IPVFSATEMRTAAI FPAASRHMPVFCS S DGSRS 2 68 

orf 14. pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I 
orfl4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 

orf 14. pep RXLTNPTVSVRIMLHSG 167 
I I I I I I I I I I I I I I : I 

orfl4ng RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAVVPDD AAAVRAV I E V DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQVVQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAVVSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 

1 . . GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 
51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 
101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 
151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 
201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 
251 AAA. NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 

301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 
351 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 
401 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 
451 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 
501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA. CC GCGC . . 

This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 

1 . . GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 
51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 
101 VAAILPFVFA YIGLANTAXK GVVPQTVWA FYVGAALLVI TSAFTIFKVK 
151 EYXPETYARY HG I DVAANQE KANWIALLKX A.. 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

7 01 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

7 51 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG C AAG CAT ATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

1201 ATCTGTATGC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

13 51 GTTTGA 
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This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHNLGW FF ILPPLASMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMIL MPH SGSFGFGYA S LAALSFGALM IALLDV SSNM AMQPFKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AVVAAILP FVFAYIGLA N TAEKGWPQT 

201 WVAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPHK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

4 01 ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS VFLI KETHGG 

451 V* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF16a) from strain A of N. 
meningitidis: 



orfl6.pep 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 



orf 1 6 . pep MILMPNSGSFGFGY ASLAALSFGALMIALLDV SSHMAMQPFKMMVGDMVNEEQKXYAYGI 

orf 1 6a MILMPNSGSFGFGYA SLAALSFGALMIALLDV SSHMAMQPFKMMVGDMVNEEQKGYAYGI 
110 120 130 140 150 160 



>rf 16 . pep EYXPET YARYHG I DVAANQEKANW I ALLKXA 



orf 1 6a AENVWHTTPASSVGYQEAGNWYG VLAAVQSVAAVICSFVL AKVPNKYHKAGYFGCLALGA 
290 300 310 320 330 340 

The complete length ORF 16a nucleotide sequence <SEQ ID 151> is: 



1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCTCGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

12 01 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 



This encodes a protein having amino acid sequence <SEQ ID 152>: 



MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS 

ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR 

AVIVMILMPN SGSFGFGYA S LAALSFGALM IALLDV S5NM 

DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLA N 

WVAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA 

LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH 

EAGNWYG VLA AVQSVAAVIC SFVL AKVPNK YHKAGY FGCL 

FFIGNQY ALV LSYTLIGIAW AGI I TYPLTI VTNALSGKHM 

I CM PQ IVASL LSFVLFPMLG GL QATM F LVG GVVLLLGAFS 



QMSRIFQTLG 
LPYLLYGTLI 
AMQPFKMMVG 
TAEKGWPQT 
ANQEKANWIE 
TTDASSVGYQ 
ALGALGFFSV 



20 ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 



orf 16a. pep 
orfl6-l 



MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFF 



orf 16a. pep 
orfl6-l 



orf 16a. pep 
orfl6-l 



orf 16a. pep 
orfl6-l 



orfl6a.pep 
orfl6-l 



orf 16a. pep 
orfl6-l 



orfl6-l 



orf 16a. pep 
orfl6-l 



1LPPLAGKLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLTAVIVMILMPNSGSFGFGYAS 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

ILPPLAGMLVQPIVGHY3DRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYA3 
70 80 90 100 110 120 

130 140 150 160 170 180 

LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 

III I I I I I I I I I I I Mill 

LAAL S FGALMIALLDVS SNMAMQP FKMMVG DMVNEEQKGY AYG I QS FLANTGAWAAI L P 

130 140 150 160 170 180 

190 200 210 220 230 240 

FVFAYIGLANTAEKGVVPQTVWAFYVGAALLVITSAFTIFKVKEYNPETYARYHGIDVA 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 

FVFAYIGLANTAEKGVVPQTVWAFYVGAALLVITSAFTIFKVKEYDPET YARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

I I II I I I I I I I I I I I I I I I I I I I II I 1 I I I I I II I I I I I I I I I I I 

ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 



310 



320 



330 



340 



350 



360 



EAGNWYGVLAAVQSVAAV ICS FVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

I I II I I I II I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I 

EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
310 320 330 340 350 360 

370 380 390 400 410 420 

LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 

430 440 450 

GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 



430 



440 



450 
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Homology with a predicted ORF from A '.gonorrhoeae 

ORF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from N. 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 30 

HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 

MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKXYAYGI 90 

MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 191 

QSFLANTGAVVAAILPFVFAYIGLANTAXKGWPQTVWAFYVGAALLVITSAFTIFKVK 150 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I III 
QSFLANT DAVVAAILPFVFAYIGLANTAEKGWPQTVWAFYVGAALLIITSAFTISKVK 251 

E YX PET YARYHG I DVAANQEKANW IALLKXA 181 

EYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWTVTPVQFFCWFAFRYMWTYSAGAI 311 

The complete length ORF16ng nucleotide sequence <SEQ ID 153> is: 

1 ATGATAGGGG ATCGCCGCGC CGGCAACCAT TTCGGATTTT C CAAAG C AAA 

51 TACTTTTCAA ATCAAAAAAA AGGATTTACT TTATGTCGGA ATATACGCCT 

101 CAAACAGCAA AACAAGGTTT GCCCGCGCCG GCAAAAAGCA CGATTTGGAT 

151 GTTGAGCTTC GGCTATCTCG GCGTTCAGAC GGCCTTTACC CTGCAAAGCT 

201 CGCAGATGAG CCGCATTTTT CAAACGCTAG GCGCAGACCC GCACAATTTG 

251 GGCTGGTTTT TCATCCTGCC GCCGCTGGCG GGGATGCTGG TTCAGCCGAT 

301 AGTGGCTACT ACTCAGACCG CACTTGGAAG CCGCGCTTGG GCGGCCGCCG 

351 CCTGCCGTAT CTGCTTTACG GCACGCTGAT TGCGGTCATC GTGATGATTT 

401 TGATGCCGAA CTCGGGCAGC TTCGGTTTCG GCTATGCGTC GCTGGCGGCC 

451 TTGTCGTTCG GCGCGCTGAT GATTGCGCTG TTGGACGTGT CGTCGAATAT 

501 GGCGATGCAG CCGTTTAAGA TGATGGTCGG CGATATGGTC AACGAGGAGC 

551 AGAAAAGCTA CGCCTACGGG ATTCAAAGTT TCTTAGCGAA TACGGACGCG 

601 GTTGTGGCAG CGATTCTGCC GTTTGTGTTC GCGTATATCG GTTTGGCGAA 

651 CACTGCCGAG AAAGGCGTTG TGCCACAAAC CGTGGTCGTA GCATTCTATG 

701 TGGGTGCGGC GT TACT GAT T ATTACCAGTG CGTTCACAAT CTCCAAAGTC 

7 51 AAAGAATACG ACCCGGAAAC CTACGCCCGT TACCACGGCA TCGATGTCGC 

8 01 CGCGAATCAG GAAAAAGCCA ACTGGTTCGA ACTCTTAAAA ACCGCGCCTA 
851 AAGTGTTTTG GACGGTTACT CCGGTACAGT TTTTCTGCTG GTTCGCCTTC 
901 CGGTATATGT GGACTTACTC GGCAGGCGCG ATTGCAGAAA ACGTCTGGCA 
951 CACTACCGAT GCGTCTTCCG TAGGCCATCA GGAGGCGGGC AACCGGTACG 

1001 GCGTTTTGGC GGCGGTGTAG 

This encodes a protein having amino acid sequence <SEQ ID 154>: 

1 MIGDRRAGNH FGFSKANTFQ IKKKDLLYVG IYASNSKTRF ARAGKKHDLD 

51 VELRLSRRSD GLYPAKLADE PHFSNARRRP AQFGLVFHPA AAGGDAGSAD 

101 SGYYSDRTWK PRLGGRR LPY LLYGTLIAVI VMIL MPNSGS FGFGYA SLAA 

151 LSFGALMIAL LDV 5SNMAMQ PFKMMVGDMV KEEQKSYAYG IQSFLANTDA 

201 VVAAILPFVF AYIGLA NTAE KGWPQT WV AFYVGAALLI ITSA FTISKV 

251 KEYDPETYAR YHGIDVAANQ EKANWFSLLK TAPKVFWTVT PVQFFCWFAF 

301 RYMWTYSAGA IAENVWHTTD ASSVGHQEAG NRYGVLAAV* 

ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 

30 40 50 60 70 80 

MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 
50 60 70 80 90 100 

90 100 110 120 130 140 

WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I 
WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 



gonorrhoeae: 

orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 

orfl6ng 

orf 16ng 



orfl6-l.pep 
orfl6ng 

orf 16-1. pep 
orf 16ng 
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150 160 170 180 190 200 

orf 16-1. pep MQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILPEVFAYIGLANTAEKGVVPQTV 

orfl6ng MQPFKMMVGDMVNEEQKSYAYGIQSFLANTDAWAAILPFVFAYIGLANTAEKGWPQTV 
170 180 190 200 210 220 

orf 16-1 . pep 
orfl6ng 

orf 1 6-1 . pep VTLVQFFCWFAFQYMWTYSAGAIAENWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 

1111:1 I I I I I I I I I : I I I I 

orfl6ng VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
290 300 310 320 330 340 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 155>: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 

This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 



1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK GSLVMMGGKY WFVVNPEDSA XXTGILXAGL DKPFQIVXDT 
101 PSYXCHQALP VKLGSXGSQN. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 

1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFVVNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 
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201 KLFANILYTP PF LILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 28 .pep MLFRKTTAAVLAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 

orf28a MLFRKTTAAVLAATLMLKG CTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 



GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 

II I I I I I I I I I : I : I : : I I I I I I I I : I I I 

GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
70 80 90 100 110 



The complete length ORF28a nucleotide sequence <SEQ ID 159> ii 



351 
401 
451 



ATGTTGTTCC 
GAACGGCTGT 
CGACCGCCCG 
GTTGCCGAAG 
CGGGAAATAC 
GCATTTTGAA 
CCGCGCTTTG 
CCAGAATTTC 
CTGCCGACAT 
GACAATCGGA 
CGCCACACCG 
CTGCCGATAT 
TTGTTTGAAA 
GGGCGCGGTG 
CCTCAGACAA 



GTAAAACGAC 
ACGGTAATGA 
CAAACACGTT 
ACAAT3CCCA 
TGGTTCGTCG 
GGCCGGGTTG 
CCTACCAAGC 
AGTACCGAAG 
CGCCAAGCTG 
CCATTTACAC 
CAAAAACTGA 
TTATTACACG 
ATATTGCATA 
CTGGCCTTGC 
ATGA 



CGCCGCCGTT 
TGTGGGGTAT 
GACAAGGACC 
ATTGGAAAAG 
TCAATCCTGA 
GACAAGCAGT 
CCTGCCGGTC 
GCCTTTGCCT 
AAACAGCTTG 
GCGCTGCGTC 
ACGCCGATTA 
GTTACGAAAA 
TACGCCCACC 
CTGTCGCGGC 



TTGGCGGCAA 
GAACAGCCCG 
AAATCCGCGC 
GGCAGCCTGG 
AGATTCGGCG 
TTCAAATGGT 
AAACTCGAAT 
GCGCTACGAT 
AGTTTGAAGC 
TCCGCCAAAG 
TCATTTTGAG 
AACATACCGA 
ACGTTGATAC 
GTTGATTGCA 



CCTTGATGTT 
TTCAGCGAAA 
CTTCGGTGTG 
TGATGATGGG 
AAGCTGACGG 
TGAGCCCAAC 
CGCCCGCCAG 
ACCGACAGAC 
GGTCGAACTC 
GCAAATACTA 
CAAAGTGTGC 
CAAATCCAAG 
TGGATGCGGT 
GCCACGAATT 



This encodes a protein having amino acid sequence <SEQ ID 160>: 

1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 
201 LFENIAYTPT TL ILDAVGAV LALPVAALI A ATNSSDK* 

ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 



MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
I I I I I I I I I I I I I I I I I I I I I : I : I I I I : I III : I I I I I I I I I I II I I I I I I I I I I I I 
MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 



GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 
GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 



FSTEGLCLRYDTDRPADIAKLKQL3FEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
I M I I I I I I I I I I : I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I II II I I 
FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
130 140 150 160 170 180 
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orf 28a . pep EQSVPADIYYTVTKKHTDKSKL FENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 

I I I I I!l::l Nihil :||| ::::: II 

orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 
190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
gonorrhoeae: 

orf 28 -pep MLFRKTTAAVLAHTLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 60 
orf28ng MLFRKTTAAVLA^TLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGVVAEDNAQLEK 60 

orf28 .pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 161 > is 



ATGTTGTTCC 
GAACGGCTGT 
CAATCACCCG 
GTTGCCGAAG 
CGGGAAATAC 
GCCTTTTGAA 
CCGAGCTATG 
CAGCCAGAAT 
GACCTGACGA 
CTCGACAATC 
CTACGCCACG 
TGCCCGCCGA 
AAGCTGTTTG 
GGCGGCCGCG 
CCTCAGACAA 



GTAAAACGAC 
ACGATGATGT 
CAAACACGTT 
ACAATGCCCA 
TGGTTCGCCG 
GGCCGGGTTG 
CCCGCCACCA 
TTCAGTACCG 
CATCGCCAAG 
GGACCATTTA 
CCGCAAAAAC 
TAT T TAT TAT 
GAAATATCTT 
GTGCTGGTCT 
ATGA 



CGCCGCCGTT 
TGCGGGGGAT 
GACAAAGACC 
ATTGGAAAAG 
TCAATCCCGA 
3ACAAGCCCT 
AGCCCTGCCG 
GAGGTCTTTG 
CTGAAACAGC 
CACGCGCTGC 
TGAACGCCGA 
ACGGTTACTG 
ATATACGCCC 
TGCCTATGGC 



TTGGCGGCAA 
GAACAACCCG 
AAATCCGCGC 
GGCAGCCTGG 
AGATTCGGCG 
TCCAAATAGT 
GTCAAATTCG 
CCTGCGCTAT 
TTGAGTTTAA 
GTATCCGCCA 
TTATCATTTT 
AAAAACATAC 
CCCTTGTTGA 
TCTGATTGCA 



CCTTGATACT 
GTCAGCCAAA 
CTTCGGTGTG 
TGATGATGGG 
AAGCTGACGG 
TGAGGATACC 
AAGCGCCCGG 
GATACCGGCA 
AGCGGTCAAA 
AAGGCAAATA 
GAGCAAAGTG 
CGACAAATCC 
TATTGGATGC 
GCCGCGAATT 



This encodes a protein having amino acid sequence <SEQ ID 162>: 

1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAE DNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLE FKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALI A AANSSDK* 

ORF28ng and ORF28-1 share 90.0% identity in 231 aa overlap: 



orf 28-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNN PVSETITRKHVDKDQIRAFGWAEDNAQLEK 
I I I I I I I I I I I I I I I : I I I I I : I I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf28ng MLFRKTTAAVIAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 



10 



20 



40 



50 



60 



70 80 90 100 110 120 

GSLVMMGGKYWFVVNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

Ml 1111 II IM II II lllllhMI MM 111111111111:1: 

GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYAT PQKLNADYHF 

III I I I II I I I : I I I I I I II I I : I I I I I I II II I I II II I I II I I I I II I I 

FSTGGLCLRYDTGRPDDIAKLKQLEFKAVKLDNRTIYTRCVSAKGKYYAT PQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

II I! II ' I M II II I I I M II II I : M I M : i I I : : | ::|: 

EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 163>: 

1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT . . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
101 TKTSIVPQAP FSDRWLEENA GAASG. . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

4 51 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 

901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG T TAAAAAT AC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACT3GA GTTCAGCAAG TTTTGATTCA 



WO 99/24578 



-144- 



PCT/IB98/01665 



1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

14 01 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT KERTGFEGVI GYETHFSGHG 

101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLT PNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

4 51 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology w ith a predicted ORF from N. meningitidis f strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 
meningitidis: 

10 20 30 

orf29 .pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

I : I : I I I I I I I I I I I I : I I I I I I I I I I I I I 
orf29a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE 
50 60 70 80 90 100 



VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 



SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 
I I I I II I I , I I I I : I I I Mlllllll 

XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



Orf29a MDDIRGIVQGAVNPFLMGFQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLA 
230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 
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951 TGCCCTTGCC GTAGCAGANG CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

1151 CACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

1201 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAGCAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

1301 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 

1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

14 01 TACAGCATTT AGACCAACAT CAGGTAAAAA ATATTATGAT GATTTATAG 

This encodes a protein having amino acid sequence <SEQ ID 168>: 

1 MNXPIQKFMM LFAAAISXLQ IFISHA NGLD ARLRDDMQAK HYE PGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGI I GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 

351 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 

401 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 

4 51 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 

10 20 30 40 50 60 

MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

II Ill I I I I I I I I I I I I I I I I I I I I I I I I I I = 

MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

! Illlllll II 11:1:: llllll II II :|ll II il I Ilhllllll 

RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
70 80 90 100 110 120 



orf29a.pep 
orf29-l 

orf29a.pep 
orf29-l 



190 200 210 220 230 240 

APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 

I I I II I I I I I I I I I I I I I I I I II : I I I I I I I I I I I I I I 

APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

FQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDSAFAVKDGINS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I II I : I I I I I I I I I I I I I I 
FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

ARQWADAHPNITATAQTALAVAXAATTVWGGKKVELNPTKWDWVKNTGYXTPAVRTMHTL 

I : 1 I f I Illll::| II III I il l, I ||:| |:|| 

AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 
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Homology with a predicted QRF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 
gonorrhoeae: 



orf29 .pep 


VSPVLPITHERTGFEGVIGYETHFSGHGHE 


30 


orf 29ng 


EPGGKYHLFGNARG3VKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 


102 


orf29.pep 


VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 

1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MM:: Ml M ! 

VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 


90 


orf29ng 


162 


orf 2 9. pep 


SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 


125 


orf29ng 


SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 


222 



The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRD DMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGG 

151 GYPPPGGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGLGVGAIT 

251 DSAVSPVTYA AARKTLQGIH NLGNLSPEAQ LAAATALQDS AFAVKDSINS 

301 ARQWADAHPN ITATAQTALA VTEAATTVWG GKKVELNPAK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNKP LESKNTVTTN NFFENTGYTE KVLRQASNGD 

4 01 YHGFPQSVDA FSENGTVIQI VGGDNIVRHK LYIPGSYKGK DGNFEYIREA 

4 51 DGKINHRLFV PNQQLP2K* 

In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 

1 atgAATTTGC CTATT CAAAA ATTCATGATG ctgttggcAg cggcaatatc 

51 gatgctGCat ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGCAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTGCG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGTGTTATC GGCTATGAAA CCCATTTTTC AGGACACGGA 

301 CACGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GCGGTTTTAC CGTTTACCAA CTTCATCGGA 

401 CAGGGTCGGA AATACATCCC GCAGACGGAT ATGACGGGCC TCAAGGCGGC 

451 GGTTATCCGG AACCACAAGG GGCAAGGGAT ATATACAGCT ACCATATCAA 

501 AGGAACTTCA ACCAAAACAA AGATAAACAC TGTTCCGCAA GCCCCTTTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCTTCCGG TTTTCTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAACGACC CCGATAAAAA 

651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 

751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACAGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 

1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 

1251 TTTTCCAATA GGAACTGCAA CTTATGAAGA GGCAGATAGA CTAGGTAAAA 

1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 

1351 AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AAT CACAATT 

1401 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 

1451 ATGAAAAAAG AAATAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l>: 

1 MNLPIQKFMM LLAAAISMLH IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

4 01 AAQDPRLSLA IHEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

4 51 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 

ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 

10 20 30 40 50 60 

orf29ng-l.pep MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

orf29-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 



orf29ng-l.pep RVCAVQTFDATAVGPILPrTHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
II I I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I 
or f 29-1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf29ng-l.pep GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 



190 200 210 220 230 240 

orf 29ng-l . pep APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 

I I I I I I I I I I I I 1:11 111:11:1 : I I 

orf 29-1 APFSDRWLKENAGAASGFFSRADEAGKLIWSSDPNKNWWANRNDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf29ng-l.pep FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 
I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I : II I I I I I I I I I I I I I I I I I I I I I I I I 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 



orf29ng-l.pep 



orf29ng-l.pep 



55 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 21 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 
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1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAAT CAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 175>: 



1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAAT CAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGA AAATTTCCCC AT TAT CAT CG TCGAGTTACG 

4 51 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQIAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of N. 
meningitidis: 

10 20 30 40 

orf30.pep MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQMFHTRADAPMQ 

O r f 3 0 a MKKQITAAVMMLSMIAPAMA NGLDNQAFE DQV FHTRADAPMQLAE L S QKEMKXTX GAFLP 

10 20 30 40 50 60 

O r f 3 0 a LXILGGAAIGMW TQHGFS YATTGRPAS VRDVAI AGGLGAI PGXVGAAGKWS FAKYGRE I 

70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 177> is: 



1 ATGAAAAAAC AAATCACCGC 

51 CGCAATGGCA AACGGCTTGG 

101 ACACGCGGGC AGATGCACCG 

151 ATGAAGGANA CAGNGGGGGC 

2 01 TGCCATTGGT ATGTGGACAC 

2 51 GACCAGCTTC TGTTAGAGAT 

301 CCTGGTGNTG TAGGCGCCGC 

351 ACGTGAGATT AAAATCGGCA 

4 01 GAACAGGTCA TCCTATTGGN 

4 51 GATAATACGG GCAAGACTTT 

501 TTGGGAATCA AAATCTACGG 

This encodes a protein having amino acid 



AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 
ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 
GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 
AGCATGGTTT TAGTTATGCA ACGACAGGCA 
GTTGCTATTG CTGGCGGATT AGGCGCAATT 
AGGAAAGGTT GTTTCCTTTG CTAAATATGG 
ATAATATGCG GATAGCCCCT TTCGGTAATA 
AAATTTCCCC ATTATCATCG TCGAGTTACG 
GCCTGGACAG GGAATTGGTC GTCATCGCCC 
ACAGATCATG GAAAAACCGC TTCTAA 

I sequence <SEQ ID 178>: 



1 MKKQITAAVM MLSMIAPAMA" NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 



orf 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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I I I I I I I I 



I I ! 



I I I I I I I I I 



I I 



MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 
LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKVVSFAKYGREI 120 
LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 120 
KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 
KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 



orf30-l 
orf30a.pep 
orf30-l 
orf30a.pep 
orf30-l 

orf30a.pep FX 

I I 

orf30-l FX 

Homology with a predicted ORF from N .gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 
gonorrhoeae: 

orf30 .pep 
orf30ng 

The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 



MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 

MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



ATGAAAAAAC 
CGCAATGGCA 
ACACGCGGGC 
ATGAAGGAGA 
TGCCATTGGT 
GACCAGCTTC 
GATGTAGGTG 
GATTAAAATC 
GTCATCCTAT 
ACGGGCAAGA 
ATCAAAATCT 



AAATCACCGC 
AACGGATTGG 
AGATGCGCCG 
CTGAAGGGGC 
ATGTGGACAC 
TGTTAGAGAT 
CTGCAGGAAA 
GGCAATAATA 
TGGAAAATTT 
CTTTGCCTGG 
ACGGACAGAT 



AGCCGTAATG 
ACAATCAGGC 
ATGCAGTTGG 
TTTTCTTCCA 
AGCATGGTTT 
GTTGCTGGCG 
GGTTGTTTCC 
TGCGGATAGC 
CCCCATTATC 
ACAGGGAATT 
CATCGAAAAA 



ATGCTGTCTA 
ATTTGAAGAC 
CGGAGCTTTC 
TTGGCTATCT 
TAGTTATGCA 
GATTAGGCGC 
TTTGCTAAAT 
CCCTTTCGGT 
ATCGTCGAGT 
GGTCGTCATC 
CCGCTTCTAA 



TGATCGCCCC 
CAAGTGTTCC 
TCAGAAGGAG 
TGGGTGGTGC 
ACGACAGGCA 
AATTCCTGGT 
ATGGACGTGA 
AATAGAACAG 
TACGGATAAT 
GCCCTTGGGA 



This encodes a protein having amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPA MA NGLDNQAFED QVFKTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 



MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

II I I I I I I II I 

MKKQITAAVMMLSMIAPAMAKGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



orf30ng.pep LAILGGAAIGMWTQHGFSYATTGRPA3VRDVA— GGLGAIPGDVGAAGKWSFAKYGREI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II 
orf30-l LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 



orf30ng.pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 



orf 30ng.pep 
orf30-l 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 22 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT . . 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

Further work revealed a further partial nucleotide sequence <SEQ ID 1 83>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI. . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from N. 
gonorrhoeae: 



orf 31 . pep MNKTLYRVIFNRKRGAVXAVA3TTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I I I I I II : : I 

orf31ng MNKTLYRVIFNRKRGAWAVA3TTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf 31. pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II I I I I I I I I : I I I II I I I I I 
orf31ng CFSALGFSLCLALGTVNIAFADGI ITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length ORF31ng nucleotide sequence <SEQ ED 185> is: 



1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

4 01 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

451 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSMTQTQL GGWIQGNPWL 
151 TRGEARVWN QINS5HP3QL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 

orf31ng 96 GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+.P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

Orf31ng 155 ARWVNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A ++N++ S + S+L GY+EV G+ A VV+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANWVANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

Orf31ng 215 -AGDFSGFKIRQGNAVIAGHGLDARDTDF 242 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

orf 31-1. pep MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I :: I I I I I II I : I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 

10 20 30 40 50 



70 80 
orf 31-1 .pep FSLLGFSLCLAVGTANIAFADGI 

orf31ng FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 
60 70 80 90 100 110 

On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 23 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 187>: 



1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG. . 

This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 1 89>: 



1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 
101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 
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151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 

351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

401 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

7 01 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAGSPMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORP32a) from strain A of TV. 

meningitidis: 

10 20 30 40 50 60 

orf32.pep MNTPPFVCWI FCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

Ill II I I I I I I I I 

orf32a MNTPPFSAGXFCKVIDN FGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 
orf32.pep CVHQ D I H VRTWH S D AAD I DTA 

I I I I I I I I 

orf32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 
70 80 90 100 110 120 

The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 

1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

351 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

4 01 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

4 51 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
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1001 
1051 
1101 



CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 
TTTGGCACAT CTACCCGCAA GATGAGAATG TCCATCTCGA CAAACTCCAC 
GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGCATCGGC 
ACACCGCCGC CTTTCAGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 
CACAACGCCT CGAATGTTGG CAAATCCTGC AACAACATCA AAACGGCTGG 
CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTTGGGC AGCCTTCCGC 
ATCCGAAAAA CTCGCCGCCT TTGTTTCAAA GCAT CAAAAA ATACGCTAG 



This encodes a protein having amino acid sequence <SEQ ED 192>: 

1 MNTPPFSAGX FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVX CVHQDIHVRT WHSDAADIDT APVXDWIET FACDLPENVL 

101 HIIRRHKPLW LXWEYLSAEX SKERLEXMFS PQESVXKXFW FMGFSEXSGG 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD VWAKWLEMWR 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDKLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW QILQQHQNGW 

351 RQGAEDW5RY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 

10 20 30 40 50 60 

orf 32-1 . pep MNTPPFVCWIFCKVIDNFGDIGVSKRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

orf32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 



orf 32-1 . pep CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 



orf 32-1. pep SNERLHLMPSPOEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

II: I Ill: III: Ill 

orf 32a SNERLHXMPSPQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSCALRKRLMLPEKNXP 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 32-1. pep EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 

I I I I I I I I I I I I I I I I I I I I I I I II : I I I I I I : I : I I I 

orf 32a EWLLFGYRSDVWAKWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFQTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 32-1. pep SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 32-1. pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 



LFGQPSAPEKLAAFVSKHQKIRX 

I I I I I I I I I I I I I I I 

LFGQPSASEKLAAFVSKHQKIRX 
370 380 



Homology with a predicted ORF from N .gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 



orf 32 .pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 
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orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 60 

orf32.pep DVPCVHQDIHVRTWHS DAADI DTA 81 
orf32ng DVPFVHQDIHVRTWHS DAADI DTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGDVWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 . ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

2 01 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

2 51 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 
301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

3 51 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 
401 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 
451 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 
501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 
551 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG 
601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT 
651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg 
7 01 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 
751 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 
801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 
851 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 
901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 
9 51 GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 
1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 
1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 
1151 AG 

This encodes a protein having amino acid sequence <SEQ ID 196; ORF32ng-l>: 



1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNIIRRHKPL WLNWEYLSAE ESNERLHLM? SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LL3DDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 



10 20 30 40 50 59 

orf 32-1. pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

orf32ng-l MNTYAFPVCW I FCKVI DN FGD I GVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 20 30 40 50 60 



60 70 80 90 100 110 119 

orf 32-1. pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 

orf32ng-l PFVHQDIHVRTWHS DAADI DTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 

70 80 90 100 110 120 



120 130 140 150 160 170 179 
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ESNERLHLMPSPQEGVQKYFWFMGFS3KSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 



>rf32ng-l 



!00 310 320 330 340 350 359 

HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

1 : M I I I M: I : I I I I I I I I I 

HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
310 320 330 340 350 360 



370 



380 



YLFGQPSAPEKLAAF/SKHQKIRX 
I I I I I I I ! I I I I I I I I I I I I I I I 
YLFGQPSASEKLAAFVSKHQKIRX 



On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
7 A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 24 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 97>: 



, TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 
GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 
ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 
AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 
GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 
ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 
CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 
CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 
TCGCCTGCTA NGGCATCCTG CCGCGCCTG.. 



50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 



. . LFLRVKVGRF FSSPATWFRX KDPVKQAVLR LYXDEWRXTS VRWKIXATSH 
SLWLCTLLGM LVSVLLLLLV RQYTFNWES? LLSNAASVRA VEMLAWLPSK 
LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL . . 
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Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAAA TCATCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 



1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWVVAAT FA F FTGFS VTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAV WLAML FLRVKVGRFF S3PATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV ISGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLA WWCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of N. 
meningitidis: 



LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 



LYXDEWRXTSVRWKIXATSH5LW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 



VEMIAWLPSKLGFPVPDARS VI EGRLNGNIADARAWSG LLVXSIACXGILPR.L 

I ! I I I I I I I I I : I I I I I I 1 I I I I I I I I I I I I 1 I 

VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLA WAVCK 
210 220 230 240 250 260 



orf33a ILXXTSENGLDLEKXXXXXXIRRWQNKITDACTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 
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The complete length ORF33a nucleotide sequence <SEQ ID 201 > is: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 

51 AGGCGGCTTT ATTTTCAGCG GCGATCCCGT GCAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGAAGATG 

151 ATCGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

2 01 GTCGTTCTGG TTGTGGGTGG CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 
251 TTTCAGTTAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

3 01 GTTTTGGCGG GCGTGNTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

3 51 GGCAATGTTG TTCCTGCGCG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

4 01 CGACGTGGTT TCGGGGCAAA GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 
4 51 TATGCGGACG AGTGGCGGCN ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 
501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 
551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 
601 TTGGGCGATT CGTCTTCGGT ACGGCTGGTG GAAATGTTGG CATGGCTGCC 
651 TGCGAAACTG GGTTTTCCCG TGCCTGATGC GCGGGCGGTC ATCGAAGGTC 
7 01 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 
7 51 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 
801 ATGCAAAATC CTTNTGNAAA CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 
851 NCNNNNNTCN NNCGNTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 
901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 
951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAATGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AG CAGAAAC C 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence <SEQ ID 202>: 

1 MLNPSRKLVE LVRILEEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAKM 

51 IDRNRMLRET LERVRAGS FW LWVAAATFAF XTXF5 VTYLL MDNQGLNFFL 

101 VLAGVXGMNT LMLAV WLAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRXPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

2 01 LGDSSSVRLV EMLAWLPAKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLA WAVCKI LXXTSENGLD LEKXXXXXXI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VXLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRAAQE GRLKTNDRT* 

ORF33a and ORF33-1 show 94.1% identity in 444 aa overlap: 

10 20 30 40 50 60 

orf 33a . pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMIDRNRMLRET 

I IN Ml I M I 1111:11 III I ! I 111111 = Mill 

orf 33-1 MLNPSRKLVELVRILDEGC-FIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMLAVWLAML 

orf 33-1 LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAWLAML 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 33a . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I II I I I I I I I I I I 

orf 33-1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 33a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 

I I I I I I II I I I I I :::: I I I III : I I I I I I I I I I I I I I I I I I I I I I 

orf 33-1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 33a . pep DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I II I I I I II I I: I I I I I I I I I I I I I I I I I I I I I I I I I I 
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orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 33a . pep DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 
I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I ! I I I I I I 
orf 33-1 DTRRETVS AVS PKI I LNDAPKWAVKLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 33a . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAVVXLLAEQGLSDDLSEKLEHW 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

orf 33-1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 450 

or f 33a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I I I I : I I I I I I I I I I I I I I I I I I I 
O r f 3 3 - 1 RNALAECGAAWLEPDRAAQEGRLKDQX 

430 440 



Homology with a predicted ORF from N. gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N. 
gonorrhoeae: 

LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 30 
LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 100 



LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 
II 1:11 I I I I I I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 143 

VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 



orf 33 .pep 
orf33ng 
orf 33. pep 
orf33ng 
orf 33 .pep 
orf33ng 

An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 
acid sequence <SEQ ID 204>: 

1 MIDRDRMLRD TLERVRAGS F WLWWVASMM FTAGFS GTYL LMDNQGLNFF 

51 LVLAGVLGMN TLMLAV WLAT LFLRVKVGRF FSSPATWFRG KGPVNQAVLR 

101 LYADQWRQPS VRWKIGATAH SLW LCTLLGM LVSVLLLLLV RQYTFNWEST 

151 LLSNAASVRA VEMLA WLPSK LGFPVPDARA VIEGRLNGNI ADARAWSGLL 

201 VGSIVCYGIL PRLLAWWCK ILLKTSENGL DLEKTYYQAV IRRWQNKITD 

251 ADTRRETVSA VSPKIVLNDA PKWALMLETE WQDGQWFEGR LAQEWLDKGV 

301 AANREQVAAL ETELKQKPAQ LLIGVRAQTV PDRGVLRQIV RLSEAAQGGA 

351 WQLLAEQGL SDDLSEKLEH WRNALTECGA AWLEPDRVAQ EGRLKDQ* 

Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 



ATGTTGaatC 
agggggtTTT 
gccgcgtgga 
atcgACAGGg 
gtcgtTctgG 
TTTCAGgcac 
GTTTTggcgG 
gGCAACGTTG 
CGACGTGGTT 
TATGCGGACC 
GGCGCACAGC 
TGCTGCTGCT 
TTGAGCAATG 
GTCGAAACTC 
GTCTGAACGG 
GGCAGTATCG 



CATCCCgaAA 

cggcAGTACG 
accgtatgtt 
TTATGGGTGG 
ttatCttCTG 
GAGTGTtggG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCA 
TTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
TCTGCTACGG 



ACTGgttgag 
gcgatcctgt 
GAggAaaaaa 
gcgggACaCg 
TggtggCAtC 
ATGGACaatC 
CATGaatacG 
TGAAAGTGGG 
GGCCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 



ctGgTCCgtA 
gcaggcgacg 
tcttccgtcg 
TtggaacGTG 
gATGATGTtt 
AGGGGCtGAA 
ctgATGCTGG 
ACGGTTTTTC 
ATCAGGCGGT 
CGATGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 



Ttttgaataa 
gaggctttgc 
GGCGGAGAtg 
TGCGTGCggg 
aCCGCCGGAT 
TtTCTTTTTA 
CAGTATGGtt 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGTAGT 
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10C1 
1051 
1101 
1151 
1201 
1251 
1301 



GTGTAAAATC 
CCTATTATCA 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
GCCAATCGGG 
GGCGCAACTG 
TGCTGCGGCA 
GTGCAGCTTT 
GGAACATTGG 
CTGACAGGGT 



CTTTTGAAAA 
GGCGGTCATC 
GGGAAACCGT 
AAATGGGCGC 
GGGCAGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GATTGTGCGG 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



CAAGCGAAAA 
CGCCGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TACGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTTTGA 



CGGattgGAT 
AGAACAAAAT 
TCGCcgaAAA 
GACCGAGTGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCG 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAGACCAATA 



TTGGAAAAAA 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCC 
GGGCGTTGCC 
AGCAGAAACC 
GACCGGGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTTGAGC 



This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 



101 VLAGVLGMNT LMLAV WLATL FLRVKVGRFF SSPATWFRGK 

151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA 

251 GSIVCYGILP RLLA WWCKI LLKTSENGLD LEKTYYQAVI 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR 

401 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE 

ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 



EEKIFRRAEM 
MDNQGLNFFL 
GPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKDQ* 



orf 33-1 .pep 
orf 33ng-l 



MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEKIFRRAEMIDRDRMLRDT 



orf33-l .pep 
orf33ng-l 



LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 
LERVRAGSFWLWVVVASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 



orf 33-1. pep 
orf33ng-l 



FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 



orf 33-1. pep 
orf 33ng-l 



orf 33-1. pep 
orf33ng-l 



orf 33-1 .pep 
orf33ng-l 



190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

I I : I I I I I I I I I I II 

DARAWSGLLVGSIVCYGILPRLLAWVVCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

I I I I I I I I I I I I I h I I : I 1 I I: I I I I I : I I I I I I I ! I 

DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 



orf 33-1. pep 
orf33ng-l 



RNALAECGAAWLEPDRAAQEGRLKDQX 
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orf33ng-l RNALTECGAAWLEPDRVAQEGRLKDQX 
430 440 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 207>: 

1 . . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT . GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC . GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG..GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGT1 CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

451 GTCC.. 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 . .QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 
51 GSTGVSLSVF SACVXGVVRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 
101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 
151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 



1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCT'lTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

7 01 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 MMMPFIMLPW IAGVPA VPGQ NRLS RISLWG LGGVFFGVSG LV WFSLGVSL 

51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGLFGFFA ILIVLL GCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 
251 GADFLGNLRL FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 
301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 

3 51 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNVWGLR AGGSAVDGGF 

4 01 RADGGAS DYC ADAAAKGKAE KGGNQGADGV RFGFHRVLPF LGVSDGIALR 
4 51 HAV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A1 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 
meningitidis: 



orf34 .pep 
orf34a 



QKSLSR ISLWGLGGVFFGVSGLV WFSLG VSXE CAC 

MMXPXIMLPWIAGVPA VPGQKRLSR XSLWGLGGXFFGVSGLV WFSLG VSXSLGVSXGCAC 



orf34 .pep 
orf34a 



FSGV SFRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : : : I : : I I I I I I 

FSGV SFRGSGRG T FVG STGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 



orf34 .pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

orf34a AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



orf34.pep S 

orf34a PFGXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

1 ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 

151 TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

401 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

551 CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

751 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 

851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 

901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

1201 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 

1 MMXPXIMLPW IAGVPA VPGQ KRLS RXSLWG LGGXFFGVSG LV WFSLGVSX 

51 SLGVSXGCAC FSGV SFRGSG RG TFVGSTGV SLSVFSACA P ASSGCLSVXA 

101 VSAGCGLTRX FXGAAGDGSP LPLSSVPSGC AGADEEAXXC SGWAASCPTT 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGXNV LTMPIANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFXFFAILIV LL GCRAMPSE GGSDGIAESA LDWXVEGDD 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYVA VGNDFDARLC GGADAQQRGA 

301 DFGCVPSVAG DVAGSARQGG DGNVXVHAFG GLFGTCNLTD ELFLAFGGDL 

351 SEQQQVAWA DNGDLGR VXF GLWLAQIGA GGGF DTQRHY WVGXRAGGS 

401 AVDGGFRADR RAADDCADAA AEGKAEDGGS QGADGVRFGF HRVLEFLGVS 

451 DGIALRHAV* 

ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 

10 20 30 40 50 60 

MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 

MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 

70 80 90 100 110 120 

FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRX FXGAAGDGSP 

FSGV3 FRGSGRGT FVGSTGVSLS VFSACVPAS SGCLSVXAVSAGCGLTRFFLGAAGDGS P 
60 70 80 90 100 110 



orf 34a. pep 
orf34-l 

orf34a.pep 
orf34-l 



LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 



LDVVXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 



DFGCVPSVAGDVAGSARQGGDGKVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 
I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 
DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 



DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 
DDGDLGRVAFGLVVLAQIGTGGGFDTQRHNVVVGLRAGGSAVDGGFRADGGASDYCADAA 



orf34a.pep 
orf34-l 



AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
I : I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 



Homology with a predicted ORF from N.eonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from N. 
gonorrhoeae: 

orf 34. pep QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 
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orf34ng MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 60 

orf34 .pep FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 90 

or f34ng FSGVSFRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR — GLTRFFLGA 114 

orf 34 .pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

I I I I I IN : I I I I I I I I I I I I : I I I I 

orf34ng AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 17 4 

orf 34. pep S 175 

orf34ng PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 

The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT TTGGCCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 

151 TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 

301 GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 

4 01 ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG 

4 51 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 

551 CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

7 01 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 

7 51 TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

8 01 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 
851 ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 
901 GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 
951 GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 

1101 TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 

1151 TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 

12 01 gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 

1251 TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 

1301 ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 214>: 

1 MMMPFIMLPW IAGVPA VPGQ KRLSR ISLWG LAGVFFGVSG LV WFSLGVSF 

51 SLG/bL FSGV tRbSG WG AFVG5TGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

201 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDVVLVEGND 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNWVYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLGR VAF GLWLAQVGT GGGF DTQRHN WIGLRAGGS 

401 AVDDGFCADG GPADDCAEAA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 

10 20 30 40 4 50 

orf 34-1 . pep MMNPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I 

orf34ng MMMPFIMLPWIAGVPAVPGQKRLSRI3LWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 



60 70 80 90 100 110 

orf 34-1 . pep FSGVSFRGSGRGT FVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 

orf34ng FSGVSFRGSGWGAFV3STGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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orf34-l.pep 
orf34ng 



LPLSSVPSGCAGSD3AAWWCSGKAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 



orf 34-1. pep 
orf 34ng 



orf34-l.pep 
orf34ng 



orf 34-1. pep 
orf34ng 



DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 



orf 34-1 .pep 
orf 34ng 



DDGDLGRVAFGLVVLAQIGTGGGFDTQRHNVVVGLRAGGSAVDGGFRADGGASDYCADAA 



orf 34-1. pep 
orf34ng 



AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AEGKAEDGGNQGADGVWFGFHRGLPFLGVSDGIALRHAVX 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
40 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 26 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 215>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

45 51 CGCCGCCTGC GGATT.CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGXAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGT CAAAGA ACAAAT CCAA GCCGAGCTGG 

201 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

251 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

50 This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 



Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

2 01 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 



WO 99/24578 



-165- 



PCT7IB98/01665 



251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orf 4 .pep MKT F FKTLSAAALAL I LAA CG-QKDS APAASAS AAADNGAAKKE I VFGTTVGDFGDMVKE 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 4a MKT FFKTLSAAALAL I LAA CGGQKDSAPAASASAAADNGAAXKE I VFGTTVGDFGDMVKE 

10 20 30 40 50 60 



60 70 80 90 

orf 4 . pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGSL 

II I I I I I I I I I I I I I I I I I I I I I I 

orf 4a XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 
70 80 90 100 110 120 



orf 4a VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 219> is: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 

501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 

551 NNNNANNNNT NNNNNNNNNK NNNNNCNNCC- NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AXKEIVFGTT 
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51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

A leader peptide is underlined. 



Further analysis of these strain A sequences revealed the complete DNA 



<SEQ ID 221>: 



ATGAAAACCT 
CGCCGCCTGC 
CCGCCGCCGA 
GTCGGCGACT 
GAAAAAAGGC 
CGAATCTGGC 
AAACCCTATC 
AGTCTTCCAA 
AATCGCTGGA 
CCGTCCAACT 
CAAACTCAAA 
CCGAAAACCT 
CCGCGTAGCC 
CATAAGCAGC 
TTGCCTATGT 
TGGCTTAAAG 
CGCGCACAAA 



TCTTCAAAAC 
GGCGGTCAAA 
CAACGGCGCG 
TCGGCGATAT 
TACACCGTCA 
ATTGGCTGAG 
TTGACGACTT 
GTGCCGACCG 
AGAAGT C AAA 
TCGCCCGCGT 
GACGGCATCA 
GAAAAACATC 
GCGCCGACGT 
GGCATGAAGC 
CAACTGGTCT 
ACGTAACCGA 
CGCTTCGAGG 
ATAA 



CCTTTCCGCC 
AAGATAGCGC 
GCGAAAAAAG 
GGT CAAAGAA 
AACTGGTCGA 
GGCGAGTTGG 
CAAAAAAGAA 
CGCCTTTGGG 
GACGGCAGCA 
CTTGGTGATG 
ATCCGCTGAC 
AAAATCGTCG 
GGATTTTGCC 
TGACCGAAGC 
GCCGTCAAAA 
GGCCTA.AAC 
GCTACAAATC 



GCCGCACTCG 
GCCCGCCGCA 
AAATCGTCTT 
CAAATCCAAC 
GTTTACCGAC 
ACATCAACGT 
CACAATCTGG 
ACTGTACCCG 
CCGTATCCGC 
CTCGACGAAC 
CGCATCCAAA 
AGCTTGAAGC 
GTCGTCAACG 
CCTGTTCCAA 
CCGCCGACAA 
TCCGACGCGT 
CCCTGCCGCA 



CGCTCATCCT 
TCCGCTTCTG 
CGGCACGACC 
CCGAGCTGGA 
TATGTGCGCC 
CTTCCAACAC 
ACATCACCGA 
GGCAAGCTGA 
GCCCAACGAC 
TGGGTTGGAT 
GCGGACATTG 
CGCGCAACTG 
GCAACTACGC 
GAACCGAGCT 
AGACAGCCAA 
TCAAAGCCTA 
TGGAATGAAG 



This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 



orf4a-l 
orf4-l 



MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 it i i 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 



orf4a-l QIQPELEKKG YTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 4-1 QIQAELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 

orf4-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

orf4-l ADIAENLKNIK1VELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 
190 200 210 220 230 240 

250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

or f 4-1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
250 260 270 280 
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Homology with an outer membrane protein of Pastearella haemolitica (accession q08869). 
ORF4 and this outer membrane protein show 33% aa identity in 91aa overlap: 

10 20 

lip2 .pasha MNFKKLLGVALVSALALTACKDEKAQAP 

I I I :: I I 111:11 : I : I 

ORF4 

lip2 . pasha -ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 
ORF4 



20 Homology with a predicted ORF from N. gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 
gonorrhoeae: 



25 



o r f 4 nm . pep MKT FFKT L SAAALAL I LAACGXQKD SAPAA 



orf4iim.pep SASA-AADNGAAKKE IVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 

orf4ng SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALA 
260 270 280 290 300 310 

90 

orf4nm.pep EGEL 
orf 4ng 

The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

1 MKTFFKTLST ASLAL ILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc g CgGCGAAAA AAGAAAtcgt CtTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

4 01 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

4 51 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 
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601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

651 CGCCATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 

701 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 

7 51 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 

801 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 

851 AAGGCGCAGC CAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPN LALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AVVNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1 : 



orf 4-1 .pep 
orf 4ng-l 



MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 



orf 4-1 .pep 
orf 4ng-l 



orf 4-1 .pep 
orf 4ng-l 



orf 4-1 .pep 
orf 4ng-l 



orf 4-1 . pep 
orf 4ng-l 



EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 



70 



90 



100 



110 



120 



.20 130 140 150 160 170 179 

QVPTAPLGLYPGKLKSLSEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 

I I I I I I I I I I I I I 1:1111: I I I I I I I I I I I I 

QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
130 140 150 160 170 180 

.80 190 200 210 220 230 239 

KADI AENLKN IKI VELEAAQLPRSRADVDFAWNGNYAI S SGMKLTEALFQE PS FAYVNW 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 
KADIAENLKN IK I VELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 
190 200 210 220 230 240 

M0 250 260 270 280 

SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 
250 260 270 280 



In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 



ID 

AC 



LIP2 PASHA STANDARD; 



?RT; 



276 AA. 



01-NOV-1995 (REL. 32, CREATED) 
DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 
DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 
SCORES Initl: 279 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 



■r f 4ng-l . pep MKTFFKTLSAAAL— ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 
ip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 



>rf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
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120 130 140 150 160 170 

orf 4ng-l . pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

: : : I : : I I : I : : I : I I I : I I : I I : I I I I I I : : I : I : I I I I I : 
lip2 pasha IGNTLVWPIAAYSKKIKNISELKDGATVAIPNNASNTARALLLLQAHGLLKLKDPKN-VF 



orf 4ng-l . pep 
lip2_pasha 

15 

240 250 260 270 280 289 

orf 4ng-l . pep YVNW S AVKTADKD S QWLKDVTEAYN S DAFKAYAHKR FEG YKY PAAWNEGAAKX 

lip2 pasha YVNLVVSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGVVKGW 
20 ' ~" 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteur -ella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8 A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
30 proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

45 401 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

4 51 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 



WO 99/24578 



-170- 



PCT/IB98/01665 



651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC... 

7 01 GC AGACACGCCC GCCGCATCCG 

7 51 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 

5 851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 

901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 

1 PRRP RHAPVSRGDL LQGGGT YARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

10 101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

1 5 Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 
20 Homology with a predicted ORF from N '.gonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from N. 
gonorrhoeae: 



orf 8ng 


1 


MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 

1 1 1 1 1 11111:1 1 1 1 1 1 1 1 1 1 1 1 1 1 

PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAEPALFPR 


50 


orf 8 . pep 


1 


44 


orf 8ng 


51 


QPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 

1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 

QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 


100 


orf 8 .pep 


45 


94 


orf 8ng 


101 


DARDERPHRRRHRHCRRQTAAAEIHTDVAFHACRQPGRLQQNDCRNQQRQ 

II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

HARHERPHRRGHRHRRRQTAAAEIHTDVAFHACRQPGRMQQNDCRNQQRQ 


150 


orf 8 .pep 


95 


144 


orf 8ng 


151 


AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 

1:1 II 1:1:11111111 

AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 


200 


orf 8 . pep 


145 


194 


orf8ng 


201 


QNRQHHRAAPDHRRQAAISQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 


250 


orf 8. pep 


195 


1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 E 1 1 1 
XNRQHHRAAPDHRRQAAISQTQRQRNPAAXPPLHTAPN Q 


244 


orf 8ng 


251 


TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 . 1 1 1 1 1 1 1 1 1 1 1 III 
TRPPHPHRHRHQPRTGSPRRTPPLPMAGLPLAQHRYASGN FRPRHPAATH 


300 




245 


294 


orf 8ng 


301 


PPQMAGCPRT PTPAPKPA* 319 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
PPQMAGC PRT PT PAPKPA* 313 




orf 8 .pep 


295 





The complete length ORF8ng nucleotide sequence <SEQ ID 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 
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201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 231>: 

1 . . GAAATCAGCC TGCGGTCCGA CNACAGGCCG GTTTCCGTGN CGAAGCGGCG 
51 GGATTCGGAA CGTTTTCTGC TGTTGGACGG CGGCAACAGC CGGCTCAAGT 

101 GGGCGTGGGT GGAAAACGGC ACGTTCGCAA CCGTCGGTAG CGCGCCGTAC 
151 CGCGATTTGT CGCCTTTGGG CGCGGAGTGG GCGGAAAAGG CGGATGGAAA 
201 TGTCCGCATC GTCGGTTGCG CTGTGTGCGG AGAATT CAAA AAGGCACAAG 

251 TGCAGGAACA GCTCGCCCGA AAAATCGAGT GGCTGCCGTC TTCCGCACAG 
301 GCTTT.GGCA TACGCAACCA CTACCGCCAC CCCGAAGAAC ACGGTTCCGA 
351 CCGCTGGTTC AACGCCTTGG GCAGCCGCCG CTTCAGCCGC AACGCCTGCG 
401 TCGTCGTCAG TTGCGGCACG GCGGTAACGG TTGACGCGCT CACCGATGAC 
451 GGACATTATC TCGGAGA.GG AACCATCATG CCCGGTTTCC ACCTGATGAA 
501 AGAATCGCTC GCCGTCCGAA CCGCCAACCT CAACCGGCAC GCCGGTAAGC 
551 GTTATCCTTT CCCGACCGG.. 

This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 
51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 
101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 
151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 

1 ATGACGGTTT TGAAGCTTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGC7GGG GGAAAGGTCG GGTTTTCAGA 

2 51 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGTC GGCGCGCCTT 

501 GTCGCGTTTA GGTTTGGATG TGCAGATTAA GTGGCCCAAT GATTTGGTTG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTTG TCCTGCCCAA 

651 GGAAGTAGAA AATGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

7 01 GGCGGGGCAA TGCCGATGCC GCCGTGCTGC TGGAAACGCT GTTGGTGGAA 

7 51 CTGGACGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTT TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCTGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>: 



1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDKGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTVVSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGE FKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVVVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVAS GMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1. Further 
computer analysis of this amino acid sequence gave the following results: 



Homology with the baf protein of B. pertussis (accession number U12020Y 
ORF61 and baf protein show 33% aa identity in 166aa overlap: 

orf61 23 LLLDGGNSRLKWAWVE-NGTFATVGSAPYR DLS PLGAEWAEKADGNVR I VGCAVCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

baf 3 ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

orf61 7 8 EFKKAQVQEQLAR— KIEWLPSSAQAXGIRNHYRHPEEHGSDRW— FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

baf 63 LARGEAIAATLRAGGCDIRWLRAQPLAMGLRNGYRNPDQLGADRWACMVGVLARQPSVHP 122 

orf61 132 ACVWSCGTAVTVDALTDDGHYLGXGTIMPGFHLMKESLAVRTANL 177 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
baf 123 PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 



Homology with a predicted ORF from A '.meningitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of N. 
meningitidis: 



10 20 30 

orf 61 . pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 

II I I II 

orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 
290 300 310 320 330 340 



40 50 60 70 80 90 

orf 61. pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 61a RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLAR 
350 360 370 380 390 400 



KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALT DD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I II I I I I I I I I I 
KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVVVSCGTAVTVDALT DD 
410 420 430 440 450 460 

160 170 180 189 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

GHYLG-GTIMPGFHLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 
470 480 490 500 510 520 

HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
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530 540 550 560 570 

The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 



1001 
1051 
1101 
1151 

1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTG 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTGGAA 
GGCGGGGAAA 
CTTGATGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTC 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 
GCAAAAGTTG 
GCGCGTGGCG 
CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGAAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGC3GCTCGG 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGTTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CAC3GCGGTA 
GAACCATCAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATTCACGG 
CATACTTAA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGTG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGCC 
GTGGCCAAAC 
TTGAAACGGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTCG 
CAGGACGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GTTGGCGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGTGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This encodes a protein having amino acid sequence <SEQ ID 23 6>: 



MTVLKPSHWR 
LLRQHDGYWR 
ARIAPDKAHK 
ELGSLSPVAA 
GKTVAWGIG 
LDAVLLQYAR 
QGVLHLETAE 
KWAWVENGTF 
QVQEQLARKI 
CVWSCGTAV 



VLAELADGLP 
LVRPLAVFDA 
TICVTHLQSK 
VACRRALSRL 
INFVLPKEVE 
DGFAPFVAEY 
GKQTWSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALT DDGH 



QHVSQLARMA 
EGLRELGERS 
GRGRQGRKWS 
GLKTOIKWPN 
NAASVQSLFQ 
QAANRDHGKA 
SLRSDDRPVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSVMMMHGR 
DNLVIHGLLN 



DMKPQQLNGF 
GFQTALKHEC 
HRLGECLMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRRDSERF 
KVDGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKTGAGKP 
LIAAEGGESE 



WQQMPAHIRG 
ASSNDEILEL 
FGWVFDRPQY 
GILIETVRTG 
AVLLETLLAE 
FEGTVKGVDG 
LLLDGGNSRL 
CAVCGEFKKA 
LGSRRFSRNA 
TANLNRHAGK 
VDVIITGGGA 
HT* 



ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 



10 20 30 40 50 60 

orf 61a. pep MTVLKPSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

orf61-l MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 61a. pep LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 

70 80 90 100 110 120 



130 140 150 160 170 180 
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orf 61a. pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 

orf 61-1 GRGRQGRKWSHRLGEC1MFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 61a. pep DLVVGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

orf 61-1 DLWGRDKLGGILIETVRTGGKTVAVVGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
190 200 210 220 230 240 



250 260 270 280 290 300 

orf 61a. pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 61a. pep QGVLHLETAEGKQTVVSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

or f 6 1 - 1 QGVLHLETAEGKQTWSGEI SLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a. pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 61a. pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 

I I II I I I I I I I I I I I I I I I I I I I Ill I I I I I I I 

orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVVVSCGTAVTVDALTDDGHYLGGTIMPGb' 
430 440 450 460 470 480 



490 500 
orf 61a . pep HLMKESLAVRTANLNRHAGKRYP 
I I I I I I I I I I I I I I I I I I I I I I I 
orf61-l HLMKESLAVRTANLNRHAGKRYP 

490 500 



510 520 530 540 

FPT7TGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

FPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 
510 520 530 540 



550 560 570 580 590 

orf 61a. pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

orf 61-1 VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 
550 560 570 580 590 



Homology with a predicted ORF from N.eonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from TV. 
gonorrhoeae: 

EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 
TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 



orf 61 .pep 
orf 61ng 

orf61ng 
orf 61 .pep 
orf 61ng 

orf 61ng 



RLKWAWVENGTFATVG3APYRDL3PLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 

RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 189 

GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 
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An ORF61ng nucleotide sequence <SEQ ID 23 7> was predicted to encode a protein having amino 
acid sequence <SEQ ID 23 8>: 

1 MFSFGWAFDR PQYEL GSLSP VAALAC RRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAVV GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTVVS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGE3 KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACVWSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMK DAVCGSIMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAAT CAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

17 51 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

4 01 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

551 AKVAE AL P P A FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 
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orf 61ng-l .pep MTVLKPSHWRVLAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

orf61-l MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

orf 61ng-l .pep LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

orf 61ng-l .pep GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I I I I I : I I I : : I I I I I I 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 

orf 61ng-l .pep DLVVGRDKLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

orf 61ng-l .pep AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

11111111:11 III I I I:: I I I I I: I I :: I I I I II I I II I I I I I I II I I I I I I I II 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

orf 61ng-l .pep RGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 360 

orf 61-1 QGVLHLETAEGKQTVVSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 3 60 

orf 61ng-l .pep ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 420 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEP^ADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 420 

orf 61ng-l .pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 480 

orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVVVSCGTAVTVDALTDDGHYLGGTIMPGF 480 

orf 61ng-l .pep HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 540 

I 1 I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I : I I I I I I I I II : I I I I I 
orf 61-1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 540 

orf61ng-l.pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I : I I I I I I II 
orf61-l VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVI YGLLNMIAAEGREYEHIX 593 



Based on this analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 29 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 241>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGaAGAGGG CGGCGaAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGC . . 



This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 
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1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKI PREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQ5YTVD 

201 WSVGMVLSLL YLGLGC . . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

7 01 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

7 51 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP L LIVSFVNYV LTLLLQF/ GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW IAATLVAG RL SHQK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical transmembrane protein HI0976 of H, influenzae (accession number 057147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

0rf62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI0976 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLVVQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEFLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 

HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 62 .pep MFYQILAL I IWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I ! I I I I I 

or f 62 a MFYQILALIIWSSSFIA AKYVYGG I D PALMVG VRLL I AAL PAL PACRRH VGK I PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 62. pep L LIVSFVNYVLTLLLQFV GLKYTSA ASASVIVGLEPLLMVFV GHFFFNDKARAYHW ICGA 

orf62a L LIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYHW ICGA 
70 80 90 100 110 120 



orf 62 .pep 



130 140 150 160 170 180 

AAFAGVALLMAGG AEEGGEVGW FC-CLLVLLAGAGFCAAM RPTQRLIARIGAPAFTS VSIA 
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The complete length ORF62a nucleotide sequence <SEQ ID 245> is: 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACT'l'TTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTC-G 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



This encodes a protein having amino acid sequence <SEQ ID 246>: 

1 MFYQILALII WSSSFIAA KY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW 1CGA AAFAGVALLM AGG AEEGGEV GW t'GCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSVLGVFW IAATLVAG RL SHQK* 

ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 

Orf62a.pep MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I I I I 1 I 1 I I I I I I I I I I I I 1 I I I I I Ill 

orf62-l MFYQILALIIW5SSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

orf 62a . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf62-l LLI VS FVN YVLTLLLQFVGLKYTSAASASVIVGLE PLLMVFVGHFFFNDKARAYHWI CGA 120 

orf 62a . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62a. pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 24 0 

orf 62-1 AASLMCLPFSLALAQSYTVDW5VC-MVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 24 0 

orf62a.pep SLEPWGVLLAVLILGEHLSPVSVLGVFWIAATLVAGRLSHQKX 285 

orf62-l SLEPVVGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 285 

Homology with a predicted ORF from N.sonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 
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>rf62 .pep MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

>rf62ng MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

>rf 62 .pep LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLE PLLMVFVGHFFFNDKARAYHWICGA 120 

>rf62ng LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLE PLLMVFVGHFFFNDKARAYHWICGA 120 

>rf 62 .pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

>rf62ng AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

>rf62.pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 216 

I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I 

>rf62ng AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 240 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 



1 ATGTTTTACC 

51 CGCCAAATAT 

101 GCCTGCTGAT 

151 GGCAAGATTC 

201 CAACTATGTG 

251 CCGCCGCCAG 

301 TTTGTCGGAC 

351 ATGCGGCGCG 

401 CGGAAGAGGG 

451 GCGGGCGCGG 

501 CCGCATCGGC 

551 TGATGTGCCT 

601 TGGAGCGTCG 

651 CTGGTACGCC 

701 ACGCGTCGGG 

751 GCGGTTTTGA 

801 GTTTGTCGTC 

851 ACGCGCAAAA 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 



GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 
CGGCAATGCC 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGTTG 
GGAACAAGGG 
TCGCTCGAAC 
ACATTTATCG 
CTTTCGCCGC 
GTCTGA 



TGGGGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGTTG 
CCTTGGGCGT 
TCGCGCAGGG 



This encodes a protein having amino acid sequence <SEQ ID 248>: 

1 MFYOILALII WGSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVS FVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW IAATFAAG RL SRRDAQNGNA V* 

ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 



MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
I I I I I M I I I I : I M I I I II I I T I I I II I ! 11 I I I I II I ! i I ! I I M i 

MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 



70 80 90 100 110 120 

orf 62ng . pep LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

orf 62-1 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLE PLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 62ng . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

or f 62 - 1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRL I ARI GAPAFTSVS I A 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 62ng . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orf 62-1 AASLMCLPFS LALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 



WO 99/24578 



-180- 



PCT/IB98/01665 



250 260 270 280 290 

orf 62ng .pep SLEPVVGVLLAVLILGEHLSPVSALGVFWIAATFAAGRLSRRDAQNGNAVX 

orf 62-1 S LE P WGVLLAVL I LGEHLS PVSALGVFWI AAT LVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical H.influenzae protein: 

sp I Q57147 | Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi | 107458 9 | pir | | B64 163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 

Score = 106 bits (262), Expect = 2e-22 

Identities = 56/114 (49%), Positives = 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 



Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 30 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 SGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGgtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

401 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

7 01 GCCTTGTTTT TCCGTCAGCC GG7TCCCAAA GGCGTGGCAG AGGATGCCGT 

7 51 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC. . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 
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301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGWVFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

401 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

451 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

1751 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

2001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFLALVM AL YFARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NGNGWKVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD T I VKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSSTGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKI IEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted QRF from N.meninsitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of N. 
meningitidis: 



>rf 64 . pep 



MRRFLPIAAICAXXLXXGLTAATGSTSSLA DYFWWIVAFSflM LLLVLSAVLARYVILLL K 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I i I I I I 
MRRFLPIAAICAVVLLYGLTAATGSTSSLA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 



DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFLFG FPAQFINGTINSWFGNDTHEALERSLN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 

DRRDGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 
70 80 90 100 110 



LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 



KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 



VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
VPKGVAE DAVLI EKARAXXXXL SYSKKGLQT FFLAT LLIASLLSIFLALVMALY FARRFV 



EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
I I II I I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I I 
E PVLS LAEGAKAVAQGD FSQTRPVLRNDEFGRLTKLFNHMTEQLS I AKEADERNRRREEA 
100 310 320 330 340 350 



ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 



The complete length ORF64a nucleotide sequence <SEQ ID 253> is: 



ATGCGCCGTT 
CGGACTGACG 
GGTGGATTGT 
TTGGCACGTT 
CGGTTCGCAG 
TACTGCCCGG 
ACGATTAATT 
CCTCAATTTG 
GCAACGCCAT 
NGGGATATGG 
GCTTGCCCTG 
CGCACAAGCT 
CAACAGGCGG 
CGCGCANGGC 
TGTTTTTCCG 
ATCGAAAAGG 
TTTGCAGACC 
TTTTTCTTGC 



TTCTACCGAT 
GCGGCAACCG 
TGCGTTCAGC 
ATGTCATATT 
ATTGCCAAAC 
CGTGTTTCTG 
CGTGGTTCGG 
AGCAAGTCCG 
CCCCGTGCAG 
GCAGGGTGCT 
TACAATGCCG 
CGATCAGCCG 
GTTCGGTCAG 
TGGCTGTCGG 
TCAGCCGGTT 
CAAGGGCGNA 
TTTTTCCTNG 

ACTGGTCATG 



CGCAGCCATA 
GCAGCACCAG 
GCAATGCTGC 
GCTGTTGAAA 
GCCTTTCCGG 
TTCGGCGTTT 
CAACGATACC 
CATTGAATCT 
ATAGACNTCA 
GGAACATTAC 
CAAGCGGCAA 
TTTCCAGGTA 
GGATNNGGAA 
CAGNNACGCA 
CCCAAAGGCG 
ANANNNTNAG 
CAACCCTGCT 
GCACTGTATT 



TGCGCCGTCG 
TTCGCTGGCG 
TGCTGGTGTT 
GACAGGCGCG 
GATGTTTACG 
CCGCACAGTT 
CACGAGGCGC 
GGCGGCAGAC 
TCGGCGCGGC 
GCCGGCAGCG 
AATCGAAAAA 
AGGCGCGTTG 
AGCATAGGCG 
CAACGGGCGC 
TGGCAGAGGA 
TTGAGTTACA 
GATTGCCTCN 
TCGCCCGCCG 



TCCTGTTGTA 
GATTATTTCT 
GTCCGCCGTT 
ACGGCGTATT 
CTGGTTGCCG 
TATCAACGGC 
TTGAACGCAG 
AACGCCCTTG 
TTCCCTGCCC 
GTTTTGCCCA 
AGCATCAACC 
GGAAAAAATC 
GCGTATTGTA 
GATTACGCCT 
TGCCGTCTTA 
GCAAAAAAGG 
CTGCTGTCGA 
TTTCGTCGAA 
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901 CCCGTCCTAT CGCTTGCCGA G3GGGCGAAG 

951 CAGCCAGACG CGCCCCGTGT T3CGCAACGA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC 

1401 GGTGATTGAC GACAT CACCG TTTTGATACA 

1451 GGGGCGAAGT GGCAAAACGG CTGGCACACG 

1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG 

1551 GGACGAGCAN GACGCGCAAA TCCTGACACG 

1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG 

1651 CGTTCCCCTT CGNCTCAATT GGAAAATCAG 

1701 CGATGTGTTG GCATTGTACG AAGCTGGTGC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAGC 

1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT 

1951 AATGCCTTCG AGCCGTATGT AACGGACAAA 

2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA 

2051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG 

2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence <SEQ ID 254>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAIPVQ IDXIGAASLP 

151 XDMGRVLEIIY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QQAGSVRDXE SIGGVLYAXG WLSAXTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAXXXX LSYSKKGLQT FFLAT LLIAS LLSIFLALVM ALY FARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGWVFDE QGCLKTFNKA AEQILGMPLT 

4 01 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NXNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEX DAQILTRSTD TIIKQVAALK EMVEAFRNYX 

551 RSPSXQLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVEE VRVKSEAGQD GRIVLTVCDN GKGFGREMLH 

651 NAFEPYVTDK PAGTGLXLPV VKKIIEEHGG XISLSNQDAG GAXVRIILPK 

7 01 TVETYA* 

ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 

10 20 30 40 50 60 

orf64a.pep MRRFLPIAAI CAVVLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

III I I I I I I I I I I I I I I I I I I I I I I I 

Orf64-l MRRFLPIAAI CAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 64a . pep DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I ! I I I I I I I I I I I I I I I I I I I I I I I I 
orf 64-1 DRRDGVFGSQIAKRLSGMFT LVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 64a. pep SKSALNLAADNALGNAIPVQ I DXIGAASLPXDMGRVLEHYAGSGFAQLAL YNAASGKIEK 

I I I I I I I I I I I I I I I I: II I I I I I I I I I I I I I I I I I I I I 

orf 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 64a. pep SINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQPV 

orf 64-1 SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 
190 200 210 220 230 240 

250 260 270 280 290 300 



GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AGACATTATC 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACNGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAGGGA 
CCGGCTGGAA 
ACACGGCGGC 
TCAGAATCAT 



AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TCGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACCATCATCA 
CAATTACNCG 
CCTTAATCGG 
GCGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGATTGNG 
CNCATCAGCC 
CTTGCCAAAA 
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irf 64a . pep 
•rf64-l 



PKGVAEDAVLIEKARAXX S S I 3L Q T F FLAT LLIASLLSI FLALVMAL Y FARR FVE 



orf 64a . pep 
orf64-l 



orf 64a. pep 
orf64-l 



RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 



orf 64a . pep AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 

I I I I I I 1 I I I I I I I IN I I II I I I I 

orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64a. pep EAAWGEVAKRLAHEIRNPLTPIQL3AERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 

orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 



orf64-l 



EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 



orf 64a . pep 
orf64-l 



VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 



orf 64a . pep PAGTGLXLPWKKIIEEHGGXISLSNQDAGGAXVRIILPKTVETYAX 

I I I I I I I 111:1111 

orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 
670 680 690 700 



Homology with a predicted ORF from N.sonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 



50 


gonorrhoeae: 








orf 64 .pep 


MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 


60 




orf64ng 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 


60 


55 


orf 64 . pep 


DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 
1 1 1 : 1 1 1 1 1 II II 1 1 1 1 1 1 1 : 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 
DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 


120 




orf 64ng 


119 


60 


orf 64 . pep 


LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 


180 


orf 64ng 


LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 


179 




orf 64 .pep 


KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 


240 


65 


orf64ng 


KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 


239 
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orf64.pep VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFV 300 

orf64ng I PENVAQDAVL IEKARAKYAELSYSKKGLQT FFLVTLL I AS LLS I FLALVMALYFARRFV 2 99 

orf64 .pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 3 60 

or f 64 ng EPI LSLAEGAKAVAQGDFSQTRFVLRNDEFGRLTKLFNHMTEQLS IAKEADERNRRREEA 359 

orf64.pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 394 

I I I I I I I I I : I II I I I I I : I : I 
orf64ng ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 400 

An ORF64ng nucleotide sequence <SEQ ID 255> was predicted to encode a protein having amino 
acid sequence <SEQ ID 256>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS AM LLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERSLNL SK3ALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGVWSYP LSCCRTAVFS TCHSSPLSYF* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 

1 ATGCGCCGCT TCCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGCTGTA 

51 CGGATTGACG GCGGCGACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATAGT CTCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCA ACGGCGTGTT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTCACG CTGGTCGCCG 

251 TACTGCCCGG CTTGTTCCTG TTCGGCATTT CCGCGCAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGACACC CACGAAGCCC TCGAACGCAG 

351 CCTTAATTTG AGCAAGTCCG CACTGGATTT GGCGGCAGAC AATGCCGTCA 

4 01 GCAACGCCGT TCCCGTACAG ATAGACCTCA TCGGCACCGC CTCCCTGTCG 

451 GGCAATATGG GCAGTGTGCT GGAACACTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGGAA AATCGAAAAA AGCATCAATC 

551 CGCACCAATT CGACCAGCCG CTTCCCGACA AAGAACATTG GGAACAGATT 

601 CAGCAGACCG GTTCGGTTCG GA3TTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGA TGGTTGTCGG CAGGTACGCA CAACGGGCGC GATTACGCGC 

701 TGTTCTTCCG CCAGCCGATT CCCGAAAATG TGGCACAGGA TGCCGTTCTG 

751 ATTGAAAAGG CGCGGGCGAA ATATGCCGAA TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTTCTGG TAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC GCTGGTAATG GCACTGTATT TTGCCCGCCG TTTCGTCGAA 

901 CCCATTCTGT CGCTTGCCGA GGGCGCAAAG GCGGTGGCGC AGGGTGATTT 

951 CAGCCAGACG CGCCCCGTAT TGCGCAACGA CGAGTTCGGA CGTTTGACCA 

1001 AGCTGTTCAA CCATATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAACGCAACC GCCGGCGCGA GGAAGCCGCC CGTCACTACC TCGAGTGCGT 

1101 GTTGGATGGG TTGACTACCG GTGTGGTGGT GTTTGACGAA AAAGGCCGTT 

1151 TGAAAACCTT CAACAAGGCG GCGGAACAGA TTTTGGGGAT GCCGCTCGCC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TtgccgccAT CGGTGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCAGGTGGAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CGACGGTATT GCCCGAAGAC AACGGCAACG GCGTGGTGAT 

1401 GGTGATTGAC GACATCACCG TGCTGATACG CGCGCAAAAA GAAGCCGCGT 

1451 GGGGTGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGATCAG GACGCGCAAA TCCTGACGCG TtcgACCGAC ACCATCATCA 

1601 AACAGgtggc gGCGTTAAAA GAAATGGTCG AGGCATTCCG CAATTACGCG 

1651 CGCGCCCCTT CGCTCAAACT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTTTTG GCCCTGTACG AAGCCGGCCC GTGCCGGTTT GAGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TATGCCCGAA GTCAGGGTAA AATCGGAAAC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAGGGAT TCGGCAAGGA AATGCTGCAC 

1951 AATGCTTTCG AGCCGTATGT GACGGATAAG CCGGCGGGAA CGGGACTGGG 

2001 TCTGCCTGTA GTGAAAAAAA TCATTGGAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGGGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS 
LVAVLPGLFL 
NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



LARYVILLL K 
TINSWFGNDT 

151 GNMGSVLEHY 

201 QQTGSVRSLE 

251 IEKARAKYAE 

301 PILSLAEGAK 

351 ERNRRREEAA 

401 PLWGSSRHGW 

451 LGKATVLPED 

501 PIQLSAERLA 

551 RAPSLKLENQ 

601 VLHNIFKNAA 

651 NAFEPYVTDK 

701 TVETYA* 



DRRNGVFGSQ 
HEALERSLNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDDQ 
DLNALIGDVL 
EAAEEADMPE 
PAGTGLGLPV 



IAKRLS GMFT 
SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVT LLIAS 
RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIRAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKI IGEHGG 



RLTKLFNHMT 
KGRLKTFNKA 
AGTDKPVQVE 
EAAWGEVAKR 
TIIKQVAALK 
EAELAGEPLM 
GRIVLTVCDN 
RISLSNQDAG 



AM LLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALY FARRFVE 
EQLSIAKEAD 
AEQILGMPLA 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
MAADTTAMRQ 
GKGFGKEMLH 
GACVRIILPK 



ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 



MRRFLPIAAI CAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 
I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I : II I I I I I I I I I I I I I I I I I I I I 
MRRFLPIAAICAVVLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 



o r f 6 4 ng- 1 . pep DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFGI SAQFINGTINSWFGNDTHEALERS LNL 

111:11 I I I I I I I I II I I I I I I : I I I I : I I I I I I I I I I I I I I I I I I 

orf 64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
70 80 90 100 110 120 

130 140 150 160 170 180 
orf 64ng-l.pep SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 
:llllll:: 11:111 1:11 MINIMI MIMIIMM 

orf 64-1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLF.HYAGSGFAQLALYNAASGKIEK 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf64ng-l.pep SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 
II I I I : : I I I : I I : I I : I I : : M I I : I I M M II II I I I I I I II I I I I I I I II M II : 
orf 64-1 SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf64ng-l.pep PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFVE 

:: :: I I I M I I I I M I I II M I : MM 

orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64ng-l.pep PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
310 320 330 340 350 360 



370 



380 



390 



400 



410 



420 



orf64ng-l.pep RHYLECVLDGLTTGVWFDEKGRLKTFWKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 



•rf 64ng-l.pep AEVFAAIGAAAGTDKPVQVEYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIRAQK 



>rf 64ng-l.pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTRSTDTIIKQVAALK 
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EAAWGEVAKRLAHEIRNPLTPIQ1SAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 



>rf 64ng-l . pep EMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGEPLMMAADTTAMRQ 
I I I I I I I I I 1 I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
>r f 64- 1 EMVEAFRNYARS PSLKLENQDLNAL I G D V LAL YE AG P CR FAAE L AGE PLTVAADTTAMRQ 



orf 64ng-l . pep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 

II II I I II I I ; I I I |:hll II I Mill I I I I I I I I : I I I I I I I I I I I I I I 

orf 64-1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
610 620 630 640 650 660 

670 680 690 700 

orf 64ng-l .pep PAGTGLGLPVVKKIIGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I I I I 
orf 64-1 PAGTGLGLPWKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from A.caulinodans: 

sp I Q04 850 | NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 77 47 9 | pir | | S18624 ntrY 
protein - Azorhizobium caulinodans >gi 138737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 

Gaps = 58/720 (8%) 

IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 
I+A+ ++L GLT + + + R++KRG 

ISALATFLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 

FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 





Ident: 






Query: 


7 


30 


Sbjct: 


35 




Query: 


67 


35 


Sbjct: 


91 






127 




Sbjct: 


151 


40 


Sb j ct : 


185 
201 


45 


Query: 


234 




Sbjct: 


257 




Query: 


292 


50 


Sbjct: 


317 




Query: 


351 


55 


Sbjct: 


377 




Query: 


411 




Sbjct: 


435 


60 


Query: 


468 




Sbjct: 


489 


65 




528 




Sbjct: 


548 






588 


70 


Sbjct: 


608 



L F++ V PI 



I VL G+ GV+ D 4 



N++AE++LG L+ H 



:XXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG— NGWM 4 67 

+ VQ D + + V E + +G V+ 

-EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 488 



+DDIT LI AQ+ +AW +VA+R+AHEI+NPLTPIQLSAERL K G 
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Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPWKKIIGEHGGRISLSNQDAG-GACVRIIL 698 

NG G +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and TV '.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 31 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG . . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 



1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR I ALAS FAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 
151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

451 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALAS FAAYA IGQILDIFV F NKLRRLKAWW IAPTAS TVIG 

151 NALDTLVFFA VAF YA5SDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical protein o221 of E. coli (accession number P37619) 
ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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orf66 1 MYAFTAAQQQKAiFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

M F+ Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 
o221 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

orf66 61 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
o221 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

orf66 121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
o221 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 

Homology with a predicted ORF from N. meningitidis ("strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf66.pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFS FPFIFIATDLTV 

orf66a MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQI5GIHTTWGAFS FPFIFLATDLTV 



RIFGSHLARR IIFWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGRIA LASFAAYA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 

RIFGSHLARR IIFWVMFPALLLSYVFS VLFHNGSWTGLGAL3EFNTFVGRIA LASFAAYA 



TGQILDIF VFNKLRRLKAWWIAPNAS TVIGHALDT 

LGQILDIFV FNKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAF YASSDGFMAANWQGIAF 



The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAAT T AC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALAS FAAYA LGQILDIFV F NKLRRLKAWW VAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 

10 20 30 40 50 60 

orf 66a . pep MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFSFPFI FLATDLTV 
II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I | I | I I I I I 
orf 66-1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 
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10 20 30 40 50 60 

70 80 SO 100 110 120 

orf 66a . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 

orf 66-1 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 66a. pep LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 

Orf 66-1 IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
130 140 150 160 170 180 



190 200 210 220 229 

orf 66a . pep VD YLFKLTVCGL FFL PAYGVI LNLLTKKLTTLQTKQAQDRPAP S LQN PX 

orf 66-1 VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 
gonorrhoeae: 

orf 66. pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

III: I I I I I I I I I I I I I I : I I I I I I I I I I I 

orf66ng MYALTAAQQQKALFRLVLFHILI IAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I I I I I I I I I I II I I : I I I I I 

orf66ng RIFGSHLARRIIFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

orf66.pep IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

: I I I I I I I I I: I I I I I I I I I I I I 111111:1111 
orf66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



1 ATGTACGCAT TGACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTCCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCGGAT TTTCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCGCGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT ttgCTTTcat 

251 aCGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 ctgTCCCAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTCGTATTC GACAAATTAC 

4 01 GCCGTCTGAA AGCGTGGTGG ATTGCCCCGG CCGCATCAAC CGTCATCGGC 

451 AATGCACTGG ACACGTTAGT ATTTTTTGCC GTTGCCTTTT ACGCAAGCAG 

501 CGATGAATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACG GCCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGTGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 266>: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTW GAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PSQ FNTFVGR IALASFAAYA LGQILDIFVF DKLRRLKAWW IAPA ASTVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGIA F VDYLFKLTVC T LFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSQFNTFVGR IA LASFAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT ALQTKQAQDR PVPSLQNP* 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf 66-1 .pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66-1 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orf66ng RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 

orf 66-1 .pep IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 180 

: :||| Ilhlllllllllllllllllllllll I I I I I I 1 

orf66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

orf 66-1 .pep VDYLFKLTVCTL FFL PAYGVI LNLLTKKLTT LQTKQAQDRPAPS LQN PX 229 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I : I I I I I I I 
orf66ng VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 22 9 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

sp|P37619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 
REGION (0221) 

>gi|1073495|pir| IS47690 hypothetical protein o221 - Escherichia coli >gi|466607 
(U00039) No definition line found [Escherichia coli] >gi 11789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length = 221 

Score = 273 bits (692), Expect = 5e-73 

Identities = 132/203 (65%), Positives = 155/203 (76%) 

MYALTAAQQQKALFRLVLFHILI IAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 
M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 
MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 
RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
RIFGAPLARRI IFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 
LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 



+FFLP YGV+LN 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 32 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 267>: 



ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 
AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 
CTGAAACTGT TTCAGTTGAT AC CGGACAAG GTGCGAAAAT TCATAAGTTT 
GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 
TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 
CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 
CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 
CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 
GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 
TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
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501 TGGCTGCTAC GGCGTTGAT . . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

4 51 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 



1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N . menin gitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 72 . pep MVIKYTNLNFAKLSIIAILMMYSFEANA NAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 

or f 72a MVIKYTNLNFAKLSIIAILMMYSFEANA NAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 



DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 



HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 
I I I I I I I I I I I I I I I I I I I I I I I I I : I 
H DV YE T FKE D I QARG YQY D PET DKFAKVS GX 



The complete length ORF72a nucleotide sequence <SEQ ID 271> is: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This encodes a protein having amino acid sequence <SEQ ID 272>: 
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1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKI3ETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

orf 72a. pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
I It I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf72-l MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 72a . pep DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
o r f 7 2 - 1 DL IKTVDLTHI PTGAKARINAKI TASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 

orf 72a . pep HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 



Homology with a predicted ORF from N.sonorrhoeae 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 
gonorrhoeae: 

orf 72 .pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

II I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I : I : II I 
orf72ng mvtkhtnlnfaklsiiailmmysfeananavkisetlsvdtgqgakvhkfvpkssniyss 60 

orf72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II I : I I I II I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I : I M I I I I I I I I I I 
orf72ng DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

orf72.pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 
orf72ng HDVYETFKED IQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA AT FGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ LD 275> was identified: 



1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEAKA NA VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH I PTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf72ng-l.pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 

orf72-l MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 

II 1:111111111 I I I I : :l I I I I : I I I I 

DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 

130 140 
orf72ng-l.pe HDVYET FKE D I QARGCRYDPETDKF 

I : I I I I I I I I 

orf72-l HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 



Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



orf72ng-l.pe 
orf72-l 



Example 33 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 277>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 SCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC . . 

This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 



1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 

51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of N. 
5 meningitidis: 

10 20 30 40 50 60 

orf 7 3 . pep MRFFGIGFLVLLFLEIMSIVWVADMLGGGWTLFLMAAGFA AGVLMLRQTGLTGLLLAGAA 
I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I : I I I I I i I I 
orf 7 3a MRFFGIGFLVLLFLEIMSIVWADWLGGGWTLFLMAATFA AGWMLRHTGLSGLLLAGAA 
10 10 20 30 40 50 60 

70 

orf 73. pep MRSGGKVSVYQMLWPI 
I I I I I : I I I I III I 

15 orf 7 3a MRSGGRVSVYXMLWXIRYTVAAVC XMSPGFVSSVXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

20 151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

25 401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

451 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ TD 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVVMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

30 101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

orf 73a . pep MRFFGIGFLVLLFLEIMSIVWVADKLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 
35 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I II I I I i I II I I I 

orf 73-1 MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 73a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 

orf 73-1 MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 



orf 73a. pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
orf73-l 

Homology with a predicted ORF from N.sonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 

orf 73 .pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 
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orf73ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 60 

orf73.pep MRSGGKVSVYQMLWPI 76 
orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

4 51 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

MRFFGIGFLVLLFLEIMSrVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MRFFGIGFLVLLFLE1MSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 
10 20 30 40 50 60 



orf73-l.pep 
orf73ng 



70 80 90 100 110 120 

orf 73-1 . pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

:: 1:1: I I I I I I I I I I I I I I I I I I I I I I 

orf73ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

130 140 150 160 

orf73-l .pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
I I I I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I II : I I I I 
orf73ng NQSGRKEGFFHDDDI IEGEYTVEKPDGGNRSRNAIEHEKDEX 

130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 34 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 285>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

4 01 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

4 51 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

7 01 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACAT CAT GA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT . . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 

1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A. . . . AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD.. 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

2 51 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

4 01 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

7 51 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain Al 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

or f 7 5 . pep MFVFQTAFXMFQKHLQKASDSWGGrLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 

I I I I I I I I I I I 

orf75a MFQKHLQKASDSVVGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 
10 20 30 40 50 



70 80 90 100 110 120 

orf 75 . pep VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 

orf7 5a VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
60 70 80 90 100 110 



130 140 150 160 170 180 

orf 75 .pep RVREAGFK W PWGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 
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MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 



VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 
VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 



The complete length ORF75a nucleotide sequence <SEQ ED 289> is: 



401 
4 51 
501 
551 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACCG 
CAGCGTGCGC 
ATCTTTCAGA 
GCCGTGTGCG 
GTTTAAAGTT 
GTGTGGCTGG 
CCGAAATCGG 
GTTTCCCGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAACATCA 
GGAGCTTGCC 
TGGCACTGTC 



AACATTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCATGGTT 
ACCCGGGCGC 
GTCCCTGTTG 
TGTGGCGGGA 
GCGAACGTAG 
GTGATGTTTG 
GGAACTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAAATCA 
TTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AACGGCAGAT 
GTGGCACAGG 
GAAACTCGCC 
TCGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
CACAGCCGAG 
CGGGCGAGGG 
AAA.TGA 



GACAGCGTCG 
TTTGGCGGAC 
TCATCTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTGATG 
ATTTCAACGG 
GCCAAATGGG 
CCGCATCGGG 
GATTAATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
AAAAAAAGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAAACTCGT 
ATTGTCGGCT 
GGGTACGCCG 
GTGAGGTCGG 
GCGGCTTTGA 
TTTTGTACCG 
TGCGGGTGGC 
GCGACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCCGCG 
AACAGGCGGC 
TTGTACGATC 



This encodes a protein having amino acid sequence <SEQ ID 290>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGFK V VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKT FETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 



MFQKHLQKASDS VVGGTL YVVAT PI GNLAD I TLRALAVLQKADI I CAEDTRVTAQLLSAY 
MFQKHLQJCASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 



GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGFKV 
1 I I I I I I I I I I I I I I I I I i I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I 
GIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLARRVREAGFKV 



VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVVMFETPHRIG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I : I I I : II I I I I I I I I 
VPVVGASAVMAALSVAGVEGSDFYET1GFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 



190 200 210 220 230 240 

>rf 75a . pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 
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o r f 7 5 - 1 AT LADMAELFPERRLMLARE ITKT FET FLSGTVGE IQTAL SADGNQS RGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

or f 7 5a. pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

orf75-l EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
250 260 270 2B0 290 



Homology with a predicted ORF from N. gonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 
gonorrhoeae: 

MFVFQTAFXMFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKA AEDTR 56 

MSVFQTAFFMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 60 



orf75 .pep 
orf7 5ng 
orf75.pep 
orf75ng 

orf75ng 
orf 75 .pep 
orf75ng 
orf75.pep 



VTAQLLSAYGIQGKLVSVREHNERCMAOKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLAR 

VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 

RVREAGFKVVPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 

RVREAGFKVVPWGASAWIAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVV 

MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 

MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I t II I I I I I I I I I I I I I I : I I I II I I I I I I I I I I I I I I I I I I I I 
VLVLYPAQDEKHEGLSESAONAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 



An ORF75ng nucleotide sequence <SEQ ID 291> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFK VV PVVGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 



After further analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 



This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l> 
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1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG AT LADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 

10 20 30 40 50 60 

MFQKHLQKASDSVVGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
10 20 30 40 50 60 

70 80 90 100 110 120 

GIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLARRVREAGFKV 

GIQGRLVSVREHNERQMADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKV 
70 80 90 100 110 120 



orf 75-1. pep 
orf 75ng-l 

orf75-l.pep 
orf75ng-l 



130 140 150 160 170 180 

orf 75-1 . pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPI VMFETPHRIG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
orf75ng-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75-1. pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : II 1 I I I I I I I I I I I I I I I I 
orf7 5ng-l atladmaelfperrlmlareitktfetflsgtvgeiqtalaadgnqsrgemvlvlypaqd 

190 200 210 220 230 240 



250 260 270 280 290 

orf 75-1 . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
I I I I II I II I II I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf75ng-l EKHEGT.SESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

sp|P45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi 1 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 5 9 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 12 3 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSOAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 17 9 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 23 9 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 35 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV lAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 

1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 
51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 
101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 
151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 
201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 
251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from ^.meningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 3 laa overlap with an 
ORF (ORF76a) from strain A of TV. meningitidis: 

10 20 30 

orf7 6.pep MKQKKTAAAVIAAMLAGFAAXKA PEIDPAL 

orf 7 6a MKQKKTAAAVIAAMLAGFAAAKA PSIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
10 20 30 40 50 60 

// 

70 80 90 



WO 99/24578 



-202- 



PCT/IB98/01665 



or f 7 6 . pep XELVRNQLEQGLRQEKARLKIDALLEENGVKPX 

orf76a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

4 01 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

7 01 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

7 51 AAACCGTAA 

des a protein having amino acid sequence <SEQ ID 300>: 



1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGCAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEA5FYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 



10 20 30 40 50 60 

or f 76a . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

I I I I I I I I I I I I I I I I I I I I I i I I ! I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I ! 

orf 7 6-1 MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGOAIRND 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 6a. pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 

I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I:: I 

orf 76-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 7 6a. pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 
orf76-l YEQQIRMIKLQQVSFATEEEARQAQQLLLKGL3FEGLMKRYPNDEQAFDGFIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 6a. pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7 6-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 



IDAILEENGVKPX 
I DALLEENGVKPX 



Homology with a predicted ORF from N. gonorrhoeae 



The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 
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orf7 6.pep MKQKKTAAAV I AAMLAG FAAXKAPE I DPAL 30 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf76ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 60 

// 

orf7 6.pep ELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7 6ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

The complete length ORF76ng nucleotide sequence <SEQ ID 301 > is: 



1 


ATGAAACAGA 


AAAAGACCGC 


TGCCGCAGTT 


ATTGCTGCAA 


TGTTGGCAGG 


51 


TTTTGCGGCA 


GCCAAAGCAC 


CCGAAATCGA 


CCCGGCTTTG 


GTGGATACGC 


101 


TGGTGGCGCA 


GATCATGCAG 


CAGGCAGACC 


GGCATGCGGA 


GCAGTCCCAA 


151 


AGACCGGACG 


GGCAGGCAAT 


CCGAAACGAT 


GCCGTCCGCC 


GGCTGCAAAC 


201 


TTTGGAAGTT 


TTGAAAAACA 


GGGCATTGAA 


GGAAGGTTTG 


GATAAGGATA 


251 


AGGATGTCCA 


AAACCGCTTT 


AAAATCGCCG 


AAGCGTCTTT 


TTATGCCGAG 


301 


GAGTACGTCC 


GTTTTCTGGA 


ACGTTCGGAA 


ACGGTTTCCG 


AAAGCGCACT 


351 


GCGTCAGTTT 


TATGAGCGGC 


AAATCCGCAT 


GATCAAATTG 


CAGCAGGTCA 


401 


GCTTCGCAAC 


CGAAGAGGAG 


GCGCGTCAGG 


CGCAGCAGCT 


CCTGCTCAAA 


451 


GGGCTGTCTT 


TTGAAGGGCT 


GATGAAGCGT 


TATCCGAACG 


ACGAGCAGGC 


501 


GTTCGACGGT 


TTCATTATGG 


CGCAGCAGCT 


TCCCGAGCCG 


CTGGCTTcgc 


551 


agtttgCCGG 


TATGAACCGT 


GGCGACGTTA 


CCCGCAATCC 


GGTCAAATTG 


601 


GGCGAACGCT 


ATTACCTGTT 


CAAACTCGGC 


GCGGTCGGGA 


AAAACCCCGA 


651 


CGCGCAGCCT 


TTCGAGTTGG 


TCAGAAACCA 


GTTGGAACAA 


GGTTTGAGGC 


701 


AGGAAAAAGC 


CCGCTTGAAA 


ATCGATGCCC 


TTTTGGAaga 


Aaacggtgtc 


751 


AaacCGTAA 











This encodes a protein having amino acid sequence <SEQ ID 302>: 

1 MKOKKTAAAV I AAMLAG FAA AKA PEI DPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 

10 20 30 40 50 60 

MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I ! I I 
MKQKKTAAAVIAAMLAGFAAAKAPE I DPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 
10 20 30 40 50 60 

70 80 90 100 110 120 

AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSEDELHKF 

II! I I I I I I I I I I I i I I I I I I I I I I IIIIMII: |::| 

AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 
70 80 90 100 110 120 

130 140 150 160 170 180 

YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I 

YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
130 140 150 160 170 180 

190 200 210 220 230 240 

LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

I: I I I I I I I I = I I I I I I I I I I I I I: I I I I I I I II Ill 

LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 
190 200 210 220 230 240 

250 

I DALLEENGVKPX 
I I I I I I I I I I I II 
I DALLEENGVKPX 
250 

Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 



orf7 6-l .pep 
orf7 6ng 

orf76-l.pep 
orf 76ng 

orf 76-1. pep 
orf76ng 

orf 7 6-1. pep 
orf 76ng 

orf76-l.pep 
orf7 6ng 
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spl P24 327 I PRSA_BACSO PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi | 98227 | pir | I S15269 
33K lipoprotein - Bacillus subtilis >gi 139782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] 

>gi|2226124|gnl|PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gi|2633331|gnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length = 292 
Score =50.4 bits (118), Expect = le-05 

Identities = 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 

Query: 70 VLKNRALKEGLDK DKDVQNRFKIAEASF YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct: 53 VLTQLVQEKVLDKKYKVS DKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 

Query: 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Query: 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

DAG F Q+E+ + G+V+ DPVK Y++ K +E D 

Sbjct: 173 DSSASKGGDLGWFAKEGQMDETF3KAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 

Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKE LKSE VLEQKLN DNAA 250 



Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 10A shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 



Example 36 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCC1TACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ID 304; ORF81>: 

1 MKKS FLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALT FVIAALY 
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51 LFARNKVTRL LIAVFFAFSI IANNVHYADY Q5WMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREG3VT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 

1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAI PHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 81 . pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLF ARNKVTRL 

orf81a MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAA KMAETFALTFVIAALYLF ARYKATRL 
10 20 30 40 50 60 

70 80 
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LIAVFFAFSIIANNVH YADYQSWMT 

LIAVFFAFSIIANNVH YAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 



QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 



210 220 230 

orf 81 .pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

I I I I I I 1 I I I I 

orf 81a CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 

The complete length ORF8 la nucleotide sequence <SEQ ID 307> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACTTACTGC 

51 CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 

301 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 

7 01 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 

7 51 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 

1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1251 GGCGGAATAT GTTTATCCGC AATGA 

This encodes a protein having amino acid sequence <SEQ ID 308>: 

1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LPAAK MAETF ALTFVIAALY 

51 LFARYKATR L LIAVFFAFSI IANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNLITGDAGS LNIRDGKAEY VYPQ* 

ORF81a and ORF81-1 show 77.9% identity in 524 aa overlap: 

10 20 30 40 50 60 

orf 8 la. pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 

orf 8 1-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 



WO 99/24578 



-207- 



PCT/IB98/01665 



70 80 90 100 110 120 

orf81a.pep LIAVFFAFS I IANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 

5 or f 8 1 - 1 LIAVFFAFS 1 1 ANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLW LPVLWGVLE 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf81a.pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

orf81-l VMLFCSLAKFRRKTHFSADILFAFLMLMI FVRS FDTKQEHGI S PKPTYSRIKANYFS FGY 

130 140 150 160 170 180 



FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 

I I I I I I I I I I I I I I : I I : I I I I II: ■ I I I I 

FVGRVLPYQLFDLSRI PAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 
190 200 210 220 230 240 



orf81a.pep 
orf81-l 



-IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 



330 340 350 360 370 380 

orf81a.pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

40 I I l l 1 1 l l 1 1 I 1 1 1 M l l l l I I ll 1 1 I 1 1 1 1 1 l I I 

orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
430 440 450 460 470 480 

390 400 410 420 

45 orf 8 la. pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

orf 81-1 L I HT LG YDM PVS GCREGS VT GK L I T G DAG S LN I RDGKAE YVYPQX 

490 500 510 520 

50 Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF81 and a predicted ORF (ORF81.ng) from N. gonorrhoeae of the 
N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 



orf81.pep 
orf81ng 
orf81.pep 
orf81ng 

orf 81ng 
orf 81. pep 
orf 81ng 



MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 
MKKSLFVLFLYSSLLTASEIAYRFVFGIETLFAAKMAETFALTFMIAALYLFARYKASRL 



111111111:111 



I I I I I 



I I I I I I I I 



II I I I I 



I I I 



60 



LIAVFFAFS IIANNVHYADYQSWMT 85 
I I I I I I I I I : I I I I I I I I I I I I I I 

LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 120 

QTVFEQLQKT PDGNWL FAYT S DHGQYVRQD 433 

ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 433 

IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 



IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 
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orf81.pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 

orf81ng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

351 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC AT CAAAG CCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 

1501 GGCAACCTGA TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATAA 

This encodes a protein having amino acid sequence <SEQ ID 310>: 

1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ+ 

ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 



10 20 30 40 50 60 

orf81ng-l.pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I I I I I I I I I : : I I 
orf 81-1 MKKSFLTLVLYSSLLTASSIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 



orf 81ng-l.pep LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
or f 8 1- 1 LIAVFFAFS I IANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 



orf81ng-l.pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 8 1-1 VMLFCSIAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 



WO 99/24578 



-209- 



PCT/IB98/01665 



orf81-l FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf81ng-l .pep LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGDTNMFRLAKEQGYE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I ! I I I I I I I I I 
orf81-l LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 81ng-l . pep TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 

orf81-l TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 81ng-l . pep IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 



orf81ng-l.pep 
orf81-l 



LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 



Furthermore, ORF81ng shows significant homology to an E.coli OMP: 

: membrane adherence protein-associated protein [E. 



gi|1256380 (050906) out- 

coli] Length = 547 
Score =87.4 bits (213), Expect = 2e-16 
Identities = 122/468 (26%), Posit' 



45 


Query: 


25 








Sbjct: 


29 




Query: 


82 


50 


Sbjct: 


87 






135 


55 


Sbjct: 


142 








184 




Sbjct: 


202 


60 


Query: 


242 




Sbjct: 


258 


65 


Query: 


299 








Sbjct: 


311 




Query: 


356 


70 


Sbjct: 


360 



= 198/468 (42%), Gaps = 70/468 (14%) 

VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 
VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

VFGITNLVASSGAHMVQRLLFFVLTILWKRISSLPLRLLVAAPFVL-LTAADMSISLY- 86 

SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
SWCTFGTTFNDGFAISVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVIIKYDV 141 



r KQSYSAGFKTAVSLP SFFNVIPHANGLEQISGGDTNMFRLAKEQG 2 98 

Q+ S TA+S+P + +V+ H I N+ +A + G 

!"NQAI SGAPYTALS VPLSLTADSVLSH DIHNYPDNIINMANQAG 310 

--ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

+N A+ ++ ++ + Y G DE LLP + Q 
I'RQNGTAVTSI AMRAMETVYVRGF DELLLPHLSQALQQ 359 



IVLH GSH P 



D D YDN+IH TD H 
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Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQC— -TVQPDSYIVPL-VLYSP 454 

D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 — DRFASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 311>: 

1 . . .ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

4 01 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

4 51 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A... 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 



1 . . TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE VVPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 



1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 



1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKAS DR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A ofN. 
meningitidis: 



orf 83 .pep 
orf 83a 



20 



30 



40 



50 



TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
III : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MKTLLXLIPLVLTA CGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



orf 8 3. pep YVSVMGDQGSGNISGGRY3IDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I M I I I I I 
orf 8 3a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 



orf83.pep 
orf83a 



TSLLNAPAAXLTKNSGRKGERSAGLSTOGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 



180 190 
orf 8 3. pep IEWPPXYADTDVFVTVDV 
I II I I II I I I I I I 

orf 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

7 01 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 

1 MKTLLXLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 

10 20 30 40 50 60 

orf 83a. pep MKTLLXLIPLVLTACGTLTGIPAKGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



orf83a.pep 
orf83-l 



YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
YVSVMGDQGSGNISGGRYS IDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 



orf83a.pep 
orf83-l 



TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 



190 200 210 220 230 240 

orf8 3a.pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

orf8 3-l IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf83a.pep TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 

orf83-l TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 



orf83a.pep 
orf83-l 



DVGNEVIRRRKGGX 



DVGNEVIRRRKGGX 



Homology with a predicted ORF from ^gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 
gonorrhoeae: 

orf83.pep TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 58 

orf83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 60 

orf83.pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 118 

orf83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 120 

orf83.pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 17 8 

I I I I I I I : M I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I | | | I | | | | 

orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 180 

orf83.pep IEWPPXYADTDVFVTVDV 197 
Mill! I I I I I I I I I I I I 

orf83ng IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 2 40 

The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTACTCACCG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AGGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCCATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGACA GCGCCACCCG ATACAGCTAC 

301 CCCGCCTATG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCGGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAACGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 
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651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

7 01 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

5 851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 3 1 8>: 

1 MKTL LLLIPL VLTAC GTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRY5Y 

10 101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 OYALWM GPYS VGKT VKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

3 01 DVGNEVIRRR KGG* 

15 ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 



or f 8 3 - 1 . pep MKTLLLL I PLVLTACGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDLSALKGRKAAL 
orf83ng MKTLLLLIPLVLTACGTLTGIPAKGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



YVSVMGDQGSGNISGGRY3IDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I I I I I I I I I I I I: I I I : I 11111:1111 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

I I I I I I I : I I I I I I I I I I I I I I I I 

TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

I I I I I I I I I Ml I I I I : I I 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 



orf83-l.pep 
orf83ng 



DVGNEVIRRRKGGX 
I I I I I I I I I 1 I I I I 
DVGNEVIRRRKGGX 



Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
319>: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATC-G CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TAT CT AT AC A CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmnGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

7 01 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 
751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 
801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

8 51 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 
901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 
951 gaAAGAAGTG ACGGaGTTGA TGTGcgaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGcCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG 3AAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GG^CGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 



1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEE PAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQM LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 



1 ATGGCAGAGA TCTGTTTGAT 

51 AAAAATGGTT TCCATGATGG 

101 ACGGCATACG CCGTAAAGTA 

151 CACACCTACA TAGAAACGGA 

201 GCAGCTTTCG GCGCATGATA 

251 TCGGGTCTAT TGTCATTGTA 

301 TCGGCAGGTT CAAAAATCCC 

351 ACATCAGGGC ATTGATATAT 

4 01 ATCAAAATCT TAGAACGCTT 

4 51 AAGATGGGTA TGCGTACGCT 

501 CGTAAAAATG GCATCAAGCG 

551 AAGTTTATGA CTTGTACGAA 

601 AAGCGGTCAA AGTGGTTTTA 

651 CGTGTTTGTC GGCCTGTCCT 

7 01 AGGAAGAACC CGCAGCACAA 

751 CTTCCGGATA AAACAGAAGG 

801 AGATATGTTT GTTCCGACAT 

851 ATAACGGTGT AAGGCAGGTA 

901 GAAGGCGGAA GAACCGGATG 

951 GAAAGAAGTG ACGGAGTTGA 

1001 CGTTTAACCC ATACAAAGAA 

1051 GCGCAGCAAC AT T CGGACAG 

1101 GTAGCAGAAC CTAATGTACG 

1151 AAGGAATCGG CGGGGGCGTG 



AACCGGCACG CCCGGTTCAG GGAAAACATT 
CGAATGATGA AATGTTTAAG CCTGATGAAA 
TTTACGAACA TAAAAGGCTT GAAAATACCG 
CGCAAAAAAG CTGCCGAAAT CGACAGATGA 
TGTACGAATG GATAAAGAAG CCCGAAAATA 
GATGAAGCTC AAGACGTATG GCCGGCACGC 
TGAAAATGTC CAATGGCTGA ATACGCACAG 
TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 
GTACGGAAAC ATTACCACAT CGCTTCAAAC 
TTTAGAATGG AAAATATGCG CGGACGATCC 
CATTCTCCAG TATCTATACA CTGGATAAAA 
TCAGCGGAAG TTCATACCGT AAATAAGGTC 
CACTCTGCCA GTAATAGTAT TGCTGATTCC 
ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 
GAATCGGCGG CAACAGAACA GCAGGCAGTA 
CGAGCCGGTA AATAACGGCA ACCTTACCGC 
TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 
AGAACCTTTG AATATATAGC AGGCTGTATA 
CGCCTGCTAT TCGCATCAAG GGACGGCATT 
TGTGCAAGGA CTATGTAAAA AACGGCTTGC 
GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 
GGCGCAAGTT GCCACATTGG GCGGAAAACC 
ATAATTGGGA AGAACGCGGG AAACCGTTTG 
C-TCGGATCGG CAAACTGA 



This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIVLLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADKF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VGSAN* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of N. 
meningitidis: 



orf 84 .pep 



MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 
I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I : : I I I I I I I I I I I I I I I I I I I I I I I I I 
MAEICLITGT PGSGKTLKMV SMMAN DEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 



70 80 90 100 110 120 

orf 84 .pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

orf 8 4a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

orf 84a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 84 . pep LDKKVYDLYXXAEVHTVNKVKRSKW FYTLPVIVLLI PVFV GLSYKMLSSYGKKQEEPAAQ 

orf 84a LDKKVYDLYESAEVHTVNKVKRSKW FYTLPV I I LLI PVFVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 84 .pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
111111:111: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I : 
orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 84 . pep EGGRTGCACYSHQGTALKEVTELKCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 
310 320 330 340 350 360 



370 380 390 

orf 84 .pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
370 380 390 

The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
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751 
801 
851 



1001 
1051 
1101 
1151 



CGTAAAAATG 
AAGTTTATGA 
AAGCGGTCAA 
CGTTTTTGTC 
AGGAAGAACC 
TTTCAGGATA 
AGATATGTTT 
ATAACGGTGT 
GAAGGCGGAA 
GAAAGAAATT 
CGTTTAACCC 
GAGCAGCACC 
GTGGCAAAAT 
AAGGAATCGG 



GCATCAAGCG 
CTTGTACGAA 
AATGGTTTTA 
GGCCTGTCCT 
CGCAGCACAA 
AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACAAAGGAAA 
ATATAAAGAA 
ATTCGGACAG 
CTTATGTATG 
CGGGGGCGTG 



CATTCTCCAG 
TCAGCGGAAG 
TACTCTGCCA 
ATAAAATGTT 
GAATCGGCGG 
CGAGCCGGTA 
TGTCCGAAAA 
AGAACCTTTG 
CACATGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
ACCGCAAGTT 
ATAATTGGCA 
GTCGGATCGG 



TATCTATACA 
TTCATACCGT 
GTAATAATAT 
AAGTAGTTAT 
CAACAGAACA 
AACAACGGTA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
TTACGCAAGA 
GGCGGGATGT 
GCCACGTTGG 
GGAGCGCGGA 
CAAACTGA 



CTGGATAAAA 
AAATAAGGTC 
TGCTGATTCC 
GGAAAAAAAC 
TCAGGCAGTA 
ACCTTACCGC 
AAGCCGATTT 
AGGCTGTGTA 
GGACGGCATT 
AACGGATTGC 
CCAGCAAAGT 
GCGGAAAGCC 
AAACCGTTTG 



This encodes a protein having amino acid sequence <SEQ ID 324>: 



MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV 

HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV 

SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL 

KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE 

KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ 

FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV 

EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE 

EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYHIASN 
SAEVHTVNKV 
ESAATEHQAV 
RTFEYIAGCV 
ESQGRDVQQS 
VGSAN* 



ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 



orf84a pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

IIIIIMI I I 1 1 1 I I I IIIIIIIIIIIIIIIM Ill 

orf84-l MAETCLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf84a.pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 



130 140 150 160 170 180 

IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

IMIIIIM I I I II II I I I I 11 I I I I I . 11 I I I I I M i IIMM 

IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNPVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 

IIIMIII IIIMMMMMIMMMIMIMI IIIMIMIIIMM 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
190 200 210 220 230 240 



250 



260 



270 



290 



300 



ESAATEHQAVFQDKTEGEPVNKGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

111111:111: I I I I I I I I 1 I I I I I I I I I I II I I I 'HI MIM: 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
250 260 270 280 290 300 



EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
310 320 330 340 350 360 



ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

IMIII llilllll-.llllllllll IIIMIII 

AT LGGK PXQN LMY DNWE E RGK P FE G I GGG WG S ANX 

370 380 390 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 



orf 84 .pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 60 

llll I I MM I I I I I I I I |:::| I I I I I I I I I I II I I I : I I I I I I I 

orf84ng MAEICLITGTPG3GKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 60 

orf 84 .pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 

I I I I I I I I I I I I I II I I I M I I I : I : I I I M I I I II I I I I II I I I I I I I I II I I I I I I I I 

orf84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 

orf 84 .pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 180 

I I I I I II I I I I I I I I I I I I I I :: I I I I I : I I I I : I I I I I I I : II I I II I I I I I I I I II I I 

orf84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 180 

orf 84 .pep LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 240 

I I I I I I I I I I I : I I I I I I I : I I I I : I I I I : I I II I I I I I: I I I II 

orf84ng LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 240 

orf 84 .pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml I I II II I II II I I I I I I I I II 

orf84ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGC1 300 

orf 84 .pep EGGRTGCACY3HQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

I I I I I I I M II I I I I I II I 

orf 84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 3 60 



orf 84ng 



ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSAN 
I I I I I II I I I I II II I I I I I I I I I I I I I I I I I I I 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 



The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 



1 ATGGCAGAAA TCTGTTTGAT 

51 AAAAATGGTT TCCATGATGG 

101 ACGGCGTACG CCGTAAAGTA 

151 CACACCCACA TAGAAACAGA 

201 ACAGCTTTCG GCGCATGATA 

251 tcggcgCAAT CGTTATTGTC 

301 TccgCAGGTT CGAAAATCCC 

351 GCATCAGGGC ATAGATATAT 

4 01 ATCAGAACTT GCGAACATTG 

451 AAAATGGGTT TGCGTACCCT 

501 GGTAAAAATG GCATCAAGTG 

551 AAGTTTATGA CTTGTACGAA 

601 AAGCGTTCAA AATGGTTTTA 

651 GCTATTTGTC GGTTTGTCTT 

701 AGGAAGAACC CGCAGCACAA 

751 CTTCCGGATA AAACAGAAGG 

8 01 AGATATGTTT GTTCCGACAT 

851 ATAACGGTGT AAGGCAGGTA 

901 GAAGGCGGAA GAACCGGATG 

951 GAAAGAAGT G ACGGAGTTGA 

1001 CGTTTAACCC ATACAAAGAA 

1051 GCGCAGCAAC ATTCGGACAG 

1101 GCAGCAGAAC CTAATGTACG 

1151 AAGGAATCGG CGGGGGCGTG 

This encodes a protein having amino acid 



AACCGGCACG CCCGGTTCAG GGAAAACATT 
CAAACGATGA AATGTTTAAG CCAGATGAAA 
TTTACGAACA TCAAAGGTTT GAAGATACCG 
CGCAAAGAAG CTGCCGAAAT CAACCGATGA 
TGTATGAATG GATCAAGAAG CCTGAAAacg 
GATGAGGCGC AAGACGTATG GCCCGCACGC 
CGAAAACGTC CAATGGCTGA ACACACACAG 
TTGTATTGAC ACAAGGTCCT AAACTCTTAG 
GTTAAAAGAC AT T AC C AC AT TGCGGCCAAC 
GCTTGAATGG AAAGTATGCG CGGATGACCC 
CATTTTCCAG TATCTACACA CTGGATAAAA 
TCCGCAGAAA TTCACACGGT AAACAAAGTC 
TGCATTGCCC GT CAT CAT AT TATTGATTCC 
ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 
GAATCGGCGG CAACAGAACA GCAGGCAGTA 
AGAATCGGTG AATAACGGAA ACCTTACGGC 
TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 
AGGACCTTTG AATATATAGC AGGCTGTATA 
CACCTGCTAT TCGCATCAAG GGACGGCATT 
TGTGCAAGGA CTATGTAAAA AACGGCTTGC 
GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 
GGCGCAAGTT GCCACCTTGG GCGGAAAACC 
ACAATTGGGA AGAACGCGGG AAACCGTTTG 
GTCGGATCGG CAAACTGA 

i sequence <SEQ ID 326>: 



1 MAEICLIT GT PGSGKT LKMV SMMANDEMFK PDENGVRRKV FTNIKGLKIP 

51 HTHIETDAKK LPKSTDEQLS AH DM YEW IKK PENVGAIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 

201 KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQ.AV 

251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 



MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
I I I I I I I I I 1 I I i I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I I I I I I 
MAEICLITGTPG3GKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 



10 



20 



30 



40 



50 



60 



orf84-l.pep 
orf 84ng 



orf84-l.pep 
orf 84ng 



orf84-l.pep 
orf 84ng 



orf84-l.pep 
or£84ng 



orf 84-1 . pep 
orf 84ng 



70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I Mil! I I I I I I I I I I I I I I I I I I 

LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



130 



140 



150 



160 



170 



180 



I D I FVLTQGPKLLDQNLRT LVRKH YH I ASNKMGMRTLLEWKICADDPVKMAS SAFSSIYT 
I I I I I I I I I I I I I I I I I I I I I :: I I I I I : I I I I : I I I I I I I : I I I I I I I I I I I I I I I I I I 
I DI FVLTQGPKLLDQNLRT LVKRHYHIAANKMGLRTLLEWKVCADDPVKMAS SAFSSIYT 
130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

II :lllllllllllll:lllt:llll:llll : I I I I I I I I I I I I 

LDKKVYDLYE SAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEQQAVLPDKTEGEPWNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 



310 



320 



330 



340 



350 



360 



EGGRTGCACYSHQGTALKEVTELMCKDYVKNGI.PFNPYKEESQGQEVQQSAQQHSDRAQV 

I i I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II M 

EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
310 320 330 340 350 360 



orf 8 4-1. pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

40 I I I I I I I I I I 

orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGVVGSANX 
370 380 390 

Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
45 double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 

51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 

101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 

151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TTGACATTCA 

251 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

401 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

451 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC . 
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601 TTGCAGCAGC AATACCGCTG GCTGCGTATC 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT 

701 GCAAACGTCT . GTTGCCGAC GCAACCAAAG 

751 GAACAATTCA TGCTGGCTGC GGAAAACACG 

801 AGGCTATTTG GGATTGGACG AATTTATTAC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC 

1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG 

1101 TTTGGTCTAT CTC . . . 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSXGPLLVY L. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAAC CAT TAACCGTGAA 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

8 01 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 
851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 
951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

12 01 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

17 51 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 



CCCTTGGACA AGCAGTTGAA 
TTTGAAAGAT GGGGAAGGGC 
GCGCACCTGC CGAAATCCGC 
CTGAACATCT TTGCACAAAA 
GTCCAATATC CCGAAAGAGC 
AAATGCTTTA CGGCGTGATG 
TACGGCTTGC CCGAATGGCA 
GCACAGTATG GATGCGTACA 
TGCTGCAACT TGATGGGTTT 
ACCCGTTCCC C . GGTCCGCT 



1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL C1I RNVPPFW 
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101 REMKSFREKV 

151 DGSVLIAAKK 

201 PDNQAVYAKD 

251 DLPFEVKLKK 

301 LHGITIYQAS 

351 KYRLEFDQFT 

401 IVYRIRDAAG 

451 KQLKADTFMA 

501 FAQKGYLGLD 

551 PEWQQDEARN 
PGA LLVYLGS 



KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 



EVAKRYLEVQ 
GGLI DSNLLL 
SEGQSADWF 
TDKATGEKLE 
EPVVLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



GFQGKTINRE 
KLGMLTGRIV 
LNADNGILVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



651 QKEFPKHVES LQRLGKDLNH D* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORP (ORF88a) from strain A of TV. 
meningitidis: 



orf 88 .pep 
orf88a 



MVFLNADNGILVQDLPFEVKLKKFHIDFYN 



orf 88 .pep 
orf 88a 



TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 



orf 88 .pep 
orf88a 



ASREPWLKATS IHQFPLEIGKHKYRLSFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 



160 170 180 190 200 210 

orf 88. pep TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 

orf 88a TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 



orf 88 .pep 
orf88a 



orf 88 .pep 
orf88a 



340 350 360 370 

orf 88 . pep DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 

I 1 I I I I I I I I I I 1 I I I I I I I I 

orf 88a DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPGA LLVYLGSVLLVLGTVLM FYVREKR 

570 580 590 600 610 620 



The complete length ORF88a nucleotide sequence <SEQ ID 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



TATTTGGTCA 
ACTGTATGAC 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AGAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGC 
TCCGCGCCGT 
ATTGTTTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCACAAA 
CCCGAAAGAG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCGTAC 
TTGATGGGTT 
CCGGGTGCGC 
GGTATTGATG 
ACGGCAAAAT 
CAGAAGGAAT 
CTTGAATCAT 



AATTCGGATC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTTCG 
TCTTCGCTGT 
GGAAGTACAA 
TTCTGATTGC 
GCCCATGTTG 
CCTGCTGTTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACATCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GTATCCGTGA 
CCGGTTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGC-GTTTGA 
TTCCGAGGTG 
TTTTGGTCTA 
TTTTATGTGC 
CCGTTTTGCC 
TTCCAAAACA 
GACTGA 



GTTTTGGGCG 
CGGCATGGTT 
TGCCTGATTC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
CTTTGATTGT 
AAACTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGTGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGGGTGA 
CACCAGTTTC 
TCAGTTTACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGCGGCAGGG 
AGGAACAGGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAT 
CCGAATATCC 
CGTTCGTCGG 
TCTCGGCTCG 
GCGAAAAACG 
ATGTCTTCGG 
CGTCGAGAGT 



CAGATTTTTG 
TGTCGTTATC 
GCAATGTGCC 
AAAGAAAAAT 
AATTGCGCCC 
G AAAAAC CAT 
GGCACAATGA 
CATTTGCCTG 
TGCTGACCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TATTGAAGTA 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACG 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
GTGCTGTTGG 
GGCGTGGGTA 
CCCGCAGCGA 
CTGCAACGGC 



GTTTTCTGGG 
ATGATGTTTT 
GCCGTTCTGG 
CTCTGGCGGC 
GAGGTTGCCA 
TAACCGTGAA 
ACAAATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
ATTGGTTCAG 
ATTTTTACAA 
ACGGATAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAGCCTGTCG 
TGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
ATTACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GGCGCACCTG 
GCTGAACATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTGCAAC 
GACCCGTTCC 
TATTGGGTAC 
TTGTTTTCAG 
ACGGGATTTG 
TCGGCAAGGA 



This encodes a protein having amino acid sequence <SEQ ID 332>: 



MSKSRRSPPL 
YLVKFGSFWA 
REMKS FREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 
PEWQQDEARN 
PGA LLVYLGS 



LSRPWFAFFS 
QIFGFLGLYD 
KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



SMRFA VALLS 
VYASAW FWI 
SSLLDVKIAP 
AHVALIVICL 
NLSFRGNVNI 
PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 



LLGIASVIGT 
MMFLWSTSL 
EVAKRYLEVQ 
GGLI DSNLLL 
SEGQSADWF 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



VLQQNQPQTD 
CLIRNVPPFW 
GFQGKTINRE 
KLGMLTGRIV 
LNADNGILVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



651 QKEFPKHVES LQRLGKDLNH D* 



ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 

orf88a.pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

I ! I I I I I I I I I I I I I ! I I I I I I I I I I II I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf88-l MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf88a.pep QIFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf8 8-l QIFGFLGLYDVYASAWFVVIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

orf88-l SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGT^KWGY 180 

orf 88a. pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNI SEGQSADWF 240 
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orf88-l GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 240 

orf88a.pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88-l LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88a.pep LHGITIYQASFADGGSDLTFKAWNLGDASRE PVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf88-l LHGITIYQASFADGGSDLTFKAWNLGDASRE PVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf88a.pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf88-l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88a . pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

I I I I I I I I I I I ! I I I Ill 1 

orf 88-1 PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

orf 88a. pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 88-1 GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

orf 88a . pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I I M I i I I I I I I I I I I I I I I I I I I I I I I I 

orf 88-1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 88a . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

IN I I I I I I I I I I I I I I I I 1 I II I 

orf 88-1 PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



orf 88a . pep 
orf88-l 



672 



LQRLGKDLNHD 
I I I I I I I I I I I 
LQRLGKDLNHD 672 

Homology with a predicted ORF from N. gonorrhoeae 

ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 
gonorrhoeae: 

orf 88 .pep MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf88ng MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

orf88 .pep PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

orf88ng PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 12 0 

orf 88 .pep QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 

orf88ng QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 180 

orf 88 .pep YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 240 

orf88ng YMLPILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVAD 24 0 

orf 88 .pep ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 300 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I II 

orf88ng ATKDAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVM 300 

orf 88 .pep NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 3 60 

orf88ng NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPML^ 360 

orf 88. pep TRSXGPLLVYL 371 
III I I I I I I 

orf88ng TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 420 
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An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVFINADNGM LVQDLPFEVK LKKFHI DFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

5 101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVN 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

10 3 51 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

4 01 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 



1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

4 01 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 

501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC T GCACAGT AT 

1701 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGKVNI SEGQSADWF LNADNGMLVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVE DMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PG ALLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 

orf 88-1 . pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQT DYLVKFGSFWA 60 

orf88ng-l MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 88-1. pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf88ng-l RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88-1 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

orf88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 

orf88-l .pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf88ng-l GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

orf 88-1. pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

I I I I I I : I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I 
orf88ng-l LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88-l .pep LHGITIYQASFADGGSDLTFKAWMLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf88ng-l LHGITI YQASFADGGSDLTFKAKNLRDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 3 60 

orf 88-1 .pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf88ng~l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88-1 .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

I : I I : : I I I I : I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf88ng-l PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 88-1. pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 54 0 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf88ng-l DAPAE I REQ FMLAAENT LN I FAQKGYLGLDE FIT SN I PKGQQDKMQGY FYEMLYGVMNAA 54 0 

orf 88-1. pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I II I I II I I I I I 
orf88ng-l LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 88-1 . pep PGALLVYLGS VLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

I I I I I I I I II I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf88ng-l PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 88-1. pep LQRLGKDLNHD 671 

orf88ng-l LQRLGKDLNHD 671 

Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi 1 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length = 537 
Score =94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps = 59/334 (17%) 

FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 
+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

AWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 134 

++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 
WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 

— RYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 

++L +GF+ V E + + A+KG ++ G +AL+VI G LID 
VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 24 9 





16 


Sbjct: 


80 




75 


Sbjct: 


140 


Query: 


135 


Sbjct: 


198 
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Query: 193 GMLAGR I VPDNQAVYAKDFKPES I LGASNLS FRGNVN I S EGQS ADWFLNADNGMLVQDL 252 

+I+G RG++ ++EG + DV+ + A+ L 

Sbjct: 250 AIVGV RGSLIVAEGDTNDVMLVGAE— QKPYKL 280 

Query: 253 PFEVKLKKFHIDFY— NTGMPRDFA SDIEVTDKATGEKLER — TIRVNHPLT 300 

PFVLFIY N++FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKWEPFD 337 

Query: 301 LHGITIYQASFA— DGGSDLTFKAWNLRDASREP 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
337>: 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AAT C AAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GAC GGAT AC A 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 

1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with PilE of N. gonorrhoeae (accession number Z69260). 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 
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orf8 9 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y+ S+ G+ ++L++ 

PilE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

orf8 9 67 - DDNQT I ENKLE I FVS G YKMN PKIAKKYS VS VKFVDKEKSRAYRLVGVPKAGTGYT LS VW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVASSDKIKGKYVQSVTVAKGWTAEMASTGVNKEIQGKKLSLW 115 

Homology with a predicted ORF from N.menineitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A of N. 

meningitidis: 



orf89.pep 



MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 
I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I I I I I I I I I I 

MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 



ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

I :: II :||:|||:||:: 11111:1111 

ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 
70 80 90 100 110 120 



TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 



The complete length ORF89a nucleotide sequence <SEQ ED 341 > is: 



ATGATGAGTA 
NATNGNCNTC 
ATCNNAGTTA 
GTCGGTATCA 
CGATAATCAG 
AGATGAATCC 
AATGAGGAAA 
GACGGGTTAT 
AATGCCGTGA 
GATGTCGGCT 



ATAAAATGGA 
GCGATACNCN 
TATTGAAAAA 
ACAATATTTC 
ACCATCAAGA 
GAAAATTGCC 
AACCNAGGGC 
ACTTTGTCGG 
TGCCGCTTCT 
GTGAAGCCTT 



ACAAAAAGGG 
GCNTTANCAG 
GGCTATCAGT 
CAAACAGTNT 
GCAAACTGGA 
GAAAAATATA 
ATACAGCTTG 
TATGGATGAA 
GCCCGAGCCC 
CTCTAATCGT 



TTTACATTGA 
CGTCATTNCN 
CCCAGCTTTA 
ATTTTGAAAA 
AATATTTGTC 
ATGTTTCGGT 
GTCGGCGTTC 
CAGCGTGGGC 
ATTTGGAGAC 
AAAAAATAG 



TTGNGANGNT 
ATNNNTNCNT 
TACGGAGATG 
ATCCCCTGGA 
TCAGGCTATA 
GCATTTTGTC 
CAAAGACGGG 
GACGGATACA 
CTTGTCCTCA 



40 This encodes a protein having amino acid sequence <SEQ ID 342>: 



MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 
DVGCEAFSNR KK* 



45 ORF89a and ORF89- 1 show 83.3% identity in 1 62 aa overlap : 



MMSNKMEQKG FTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MMSNKMEQKG FTLIEMMIWAILGI I SVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 



ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

llllllllllll::|lllllllllllllll:l|:|||:||:: II:! 1 

ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
70 80 90 100 110 120 



TLSVWMNSVG DGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

lllllllll.llll MlllhllllMllllllllMI III 

TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 



WO 99/24578 



-227- 



PCT/IB98/01665 



Homology with a predicted QRF from N. gonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 



MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

MMSNKMEQKGFTLIEMMIVVTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 
I I I I I ! I I : I :: : I I : I I I I I I I I I I I I I I I I I I I I : I I I II I I I I I I I I I : I I I I I 
ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

TLSVWMNS VGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

20 101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

25 351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

451 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMSNKMEOKG FTLIEMMI VV TILGI ISVI A IPSYQSYIEK GYQSQLYTEM 
30 51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
3 5 identity in 1 62 aa overlap : 

10 20 30 40 50 60 

MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

MMSNKMEQKGFTLIEMMIWTILGIISVIAI PSYQSYIEKGYQSQLYTEMVGINNVLKQF 
10 20 30 40 50 60 

70 80 90 100 110 120 

ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRL VGVPNAGTGY 
70 80 90 100 110 120 

130 140 150 160 

TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

TLSVWMNS VGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 
130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N.meningitidis and N. gonorrhoeae, and their 



gonorrhoeae: 

orf89 
orf89ng 
orf89 
orf89ng 
orf89 
orf89ng 



50 



orf 89-1 .pep 
orf89ng 



orf 8 9-1. pep 
orf 89ng 



orf 89-1 .pep 
orf 89ng 



55 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 1 A 
shows the results of affinity purification of the GST- fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., confirming that 
ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC. . . 

This corresponds to the amino acid sequence <SEQ ID 346; ORF91>: 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 
51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 



1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCG7C AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 91. pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I i I I I I I : I I I I I I I I I I I I I I I I 
orf 91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 



YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 
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orf 91a KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

The complete length ORF91 a nucleotide sequence <SEQ ID 349> is: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGAC7 TCACCACCTA CCAAAGCGGC 

4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKS5FISAL GIGILSIGMA FA APADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf 91a . pep MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 

: I I I I I I I I I I I I I I : I I I I I : I I I ! I I I I I I I 

orf 91-1 MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 91a . pep YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLLIRTYSGTMLKLKNANVNVKDNPITO 

orf 91-1 YFDFQRMTALAVGNPWRT AS DAQKQALAKE FQTLLIRTYSGTMLKLKNANVNVKDNPIVN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 91a. pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

orf 91-1 KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 



190 

orf 91a. pep GVDGLIAELKAKNGSKX 
11111111111111:11 
orf91-l GVDGLIAELKAKNGGKX 
190 



Homology with a predicted ORF from N. gonorrhoeae 

ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

orf 91. pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

: I I I I : I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I I : I I I : I I I : I I I I I I I I : I 
orf91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf 91. pep Y FDFQRMTALAVGN PWXTXS DXQKQALAXE FQP 93 

orf 91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 



having amino acid sequence <SEQ ID 352>: 
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1 VKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

4 01 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

451 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATCAT CAAAGCCAAA GGCATCGACG 

551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 

1 MKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 

10 20 30 40 50 60 

MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSTLKNGDANTARQKAEAYAIP 

I I I I I: I I I I I I I I I I I I I I I I: I I I I I: I I! I I I ! I I I : I I 1: I I I :| : I 

MKKS3FISALGIGIL3IGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 
10 20 30 40 50 60 

70 80 90 100 110 120 

YFD FQRMTALAVGN PWRT AS DAQKQALAKE FQT LL I RT YSGTMLKLKNANVN VKDN P I VN 

I I I 1 I I I I I I I i I I I I I I I I I ! i I I I I : I I I : I I I 

YFD FQRMTALAVGN PWRT AS DAQKQALAKE FQTLL I RT Y S GTMLKFKNAT VN VKDN P I VN 
70 80 90 100 110 120 



orf 91-1. pep 
orf 91ng-l 

orf91-l.pep 
orf91ng-l 



130 140 150 160 170 180 

orf 91-1. pep KGGKEI IVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

I I I I I I : I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 

orf91ng-l KGGKEIVVRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 



orf 91-1. pep GVDGLIAELKAKNGGKX 

orf91ng-l GIDGLIAELKAKNGGKX 
190 

In addition, ORF91ng-l shows homology to a hypothetical E.coli protein: 



sp|P4 53 90|YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi I 606130 (U18997) ORF_f211 [Escherichia coli] 
>gi 1 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score =70.6 bits (170), Expect = 6e-12 

Identities = 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPI 118 

+PY + AL +G +++A+ AQ++A F+L+Y ++ T+ P 
Sbjct: 65 LPYVQVKYAGALVLGQYYKSAT PAQREAY FAAFREYLKQAYGQALAMYHGQTYQIA — PE 122 



Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG — GKYRTYNVAIEGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 
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Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 42 

The following DNA sequence was identified in N. meningitidis <SEQ ID 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVAN T LANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 357>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 358; ORF97-l>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of M 
meningitidis: 

10 20 30 40 50 60 

orf 97. pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 

I I :lllll I Mill : : 

orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 
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orf 97 .pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 



orf 97 .pep 
orf97a 



VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
VRAAYT DTRAL I AGS R I GFDEVANT LANAEKLI QKT I GEX 



The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 



ATGANACACA TACTCCCCCT 
CGCTTCGGNN CATCCTGCCA 
TGACCACGCA TACCCTCACC 
CGCCTTGAAA CCGCCATAAA 
CGACCATCAG GAAGCCGCCC 
AAGTCATCGT CTTCGGCACG 
GACCCCGCCT TCGCCCTGCA 
GGACGGCAAA GTACGCGCCG 
GCAGCCGCAT CGGTTTCGAC 
AAACTGATAC AAAAA^C CAT 



GANTGNCGCA 
GCGAACCGCA 
TCAAAATACA 
AAGCAAAGGG 
GCCGAAACGG 
CCCAAAGCCG 
ACTGCCCCTG 
CCTATACCGA 
GAAGTGGCAA 
AGGCGAATAA 



TCCGCACTCT 
AACCCAAAAC 
GTTTTGACGA 
ATGGACATTT 
CTTAACGATG 
GTACGCCGCT 
CGCGTCNTCG 
TACGCGCGCC 
ACACTTTGGC 



GCATTTCAAC 
GAAACCGCTA 
AACCGTCAGC 
TTGCCGTCAT 
CAGCCGGCAA 
GATGGTCAAA 
TTACCGAAAC 
CTCATCGCCG 
AAACGCCGAA 



This encodes a protein having amino acid sequence <SEQ ID 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYT DTRA LIAGSRIGFD EVAN T LAN AE 

151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97a. pep MXIIILPLXXASALCI STASXHPASEPQTQNETAMTTHTLT5KYSFDETVSRLETAIKSKG 

orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 



MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

I I I I I I I I I I I I 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 



orf 97a . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
gonorrhoeae: 

orf 97 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 

I I I I I I I I I I ! : I I I I I I I I I I : : I I I I II I I I I I I I I I I I : : I I I I I I 

orf 97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

orf 97 .pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 97ng MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

orf 97. pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGE 159 

I I : I I I I I I I I I : I I II : I II I I I I I I I I I I I II I I I I I 
orf97ng VRTAYTDTRALIVGSRISFDEVANTLAKAEKLIQKTVGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ID 361 > is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 



1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

2 01 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

2 51 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

4 01 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 



1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97-1. pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

orf97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 97-1 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf97ng-l MDI FAVI DHQEAARRNGLTMQPAKVI VFGTPKAGT PLMVKDPAFALQLPLRVLVTET DGK 

70 80 90 100 110 120 

130 140 150 160 

orf 97-1. pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

orf97ng-l VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGEX 
130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fusion 
proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 



experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 

The following DNA, believed to be complete, sequence was identified in N. meningitidis <SEQ ID 
365>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQL3A PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

35 This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT WRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

40 Computer analysis of this amino acid sequence gave the following results: 



25 
30 



Homology with a predicted ORE from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

45 orf 106 . pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 

I I I I I I I I I I I II:: II : : :: I I I I I I i I I I I I I I : I I I I I I I I II I I I I I I 
orf 106a MAFITRLFKS IKQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 

50 60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
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orfl06a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 10 6. pep FSTDYDTLDAALRATGAVANWKVLMKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

orfl06a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
130 140 150 160 170 180 

180 190 199 

orfl06.pep SQNWHLDSGWKPLNIIGNKX 
I I I I I I I I I I I I I I I I I I I I 
orf 106a SQNWHLDSGWKPLNIIGNKX 
190 200 

Due to the K->N substitution at residue 1 1 1, the homology between ORF 106a and ORF 106-1 is 
87.9% over the same 199 aa overlap. 



The complete length ORF 106a nucleotide sequence <SEQ ID 369> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXT1XWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 
gonorrhoeae: 

orf 106. pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLS IS SRFQTELPDQ 59 

orfl06ng MAFITRLFKS IKQWLVLLP I LSVLPDAAAEGIAATRAEARITDGGRLS IS SRFQTELPDQ 60 

orf 106. pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 119 

orfl06ng LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 17 9 

I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I t I I 
orfl06ng FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 180 

orfl06.pep SQNWHLDSGWKPLNI IGNK 198 

orfl06ng SQNWHLDSGWKPLNI IGNK 199 

Due to the K->N substitution at residue 1 1 1, the homology between ORF106ng and ORF 106-1 is 



91.0% over the same 199 aa overlap. 
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The complete length ORP106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF 106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13 A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
results of expression of the GST- fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 44 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
373>: 



1 ATGGACACAA AAGAAATCCT CGG.TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

451 CTCGCCATCC TGCTGCTG.T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

1401 GAAAAAACAA GGTTTCCCAT TATGA 

This corresponds to the amino acid sequence <SEQ ID 374; ORFIO: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILF5LDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYI FRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAWLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

451 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORF10-1>: 

1 MDTKEILGYfl AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

151 LAILLLLPLT VGLL HFPANT A VLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAISE NAPPARLSAT AES AAALLAS 

301 ALCLTGIFSP LA SLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AG VW AAY LAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 
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Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 
Homology with EpsM from Streptococcus thermophilus (accession number U4Q830). 
ORF10 shows homology with the epsM gene of S. thermophilus, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LRYGIPLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGISFGGAALLLQSIFSTVW 270 

L Y + PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 L YYAL PL I PS S I LWWLLNAS SRYFVLFFLGAGANGLLAVATK I PS 1 1 S I FNT I FTQAW 267 

Identities = 15/57 (26%), Positives = 31/57 (54%) 

Query: 7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 

L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

Sbjct: 12 LVFTIGNLGSKLLVFLLVPLYTYAMT PQEYGMADLYQTTANLLLPLITMNVFDATLR 68 

Identities - 16/96 (16%), Positives = 36/96 (37%) 

Query: 307 IFSPLASLLLPENYAAVRFTVVSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 366 

+ P+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted QRF from N .meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORFlOa) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orflO.pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

or f 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



250 260 270 280 290 300 

AGLEQLGVYSMGI S FGGAALLFQS I FSTVWT PYI FRAIEENAPPARLSATAESAAALLAS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I 1 I I I I I I I I Ill 

AGLEQLGVYSMGI S FGGAALLFQS I FSTVWT PYI FRAIEANAPPARLSATAE SAAALLAS 

250 260 270 280 290 300 
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LGALAANLLLLGL-- 



orf 10 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

orflOa LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOa nucleotide sequence <SEQ ID 377> is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 

101 ACGACATCGG ACGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CGCCGCCGAC AAAGAC ACT T TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

2 51 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGCA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 

1301 CGGCAAACTA CCCCCTGTTT GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 378>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIVVSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

4 01 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

ORFlOa and ORF10-1 show 95.4% identity in 475 aa overlap: 

10 20 30 40 50 60 

orf 10-1. pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

or f 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
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70 80 90 100 110 120 

or f 10-1. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 10-1. pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
orf 10a LSFLPIRFLLLVLRMEGRALAFS3AQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 10-1. pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 10a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 10-1. pep AGLEQLGVYSMGISFGGAALLFQ5IFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I 
orf 10a AGLEQLGVYSMGISFGGAA1LFQ3IFSTWTPYIFRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 10-1. pep ALCXTGIFSPLASLLLPENYAAVRFIVVSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 

orf 10a ALCLTGIFSPLASLLLPENYAAVRFIVVSCMLPPLFCTLVEISGIGLNVVRKTRPIALAT 
310 320 330 340 350 360 

370 380 390 400 410 419 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

LGALAANLLLLGL— AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
370 380 390 400 410 

420 430 440 450 460 470 

orf 10-1. pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 



orfl0-l .pep 
orflOa 



Homology with a predicted ORF from N.eonorrhoeae 

ORF10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 
gonorrhoeae: 

orf 10ng .pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflOnm MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 

orflOng.pep YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 12 0 

orflOnm YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

orf lOng . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orflOnm LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 18 0 

orflOng.pep NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 

orflOnm NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 240 

orflOng.pep AGLEQLGVYSMG I S FGGAALLLQS I FSTVWT PY I FRAI EENAT PARLS ATAE SAAALLAS 300 

orflOnm AGLEQLGVYSMGISFGGAALLFQS I FSTVWT PYIFRAIEENAPPARLSATAESAAALLAS 300 

orflOng.pep ALCLTGI FS PLASLLLPEN YAAVRFTWSCMLPPLFYTLTE I SGI GLNVVRKTRPIALAT 360 

orflOnm ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 3 60 
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370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL— AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

I I I I I I : I : I I I I I : I I 

orflOnm LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf lOng.pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCcccgCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG ACTGACGGTG 

151 TCGGTATTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTTTTCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG GCGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAA 

451 CTCGCCATTC TGCTGCTGTT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC TCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCGCCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGCTCGCA CTGAGCAGCC TTGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCGGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGCTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGTGC AATCGAAGAA AACGCCACGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGAAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTACCGT CGTATCGTGT ATGCTGccgc 

1001 cgctGTTTTA CACGCTGACC GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GTCCGATCGC GCTTGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCACG CGCGGCGCGG 

1151 CGCTTGCCTG TGCCGCCTCA TTCTGGTTGT TTTTTGTTTT CAAGACAGAA 

12 01 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTgGCCT CCTCGGCGGC CTACACCTGC TTCGGCACAC 

1301 CGGCAAACTA CCCcctgttt gccggcgtAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AAATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 380>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 IAIL LLLPLT VGLLHFPANT SVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTWSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAANLLL LGLA VPSGGT RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 



10 20 30 40 50 60 

orf 10-1. pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

orflOng-l MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10-1. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I I I I : I I I I I I I I I I I I I ! I I : I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 
orfl0ng-l YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

orf 10ng-l LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1. pep NLAAAAFLL FQNRCRLKAVRHAPFS PAVLHRGLRYG I P I ALS S I AYWGLAS ADRLFLKKY 

orflOng-l NLAAAAFLL FQNRCRLKAVRRAPFS PAVLHRGLRYG I PLALS S LAYWGLAS ADRLFLKKY 
190 200 210 220 230 240 



250 260 270 280 290 300 

orf 10-1. pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 

I I I I I I I I I I I I I I I I = I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 

orfl0ng-l AGLEQLGVYSMGISFGGAALLLQSI FSTVWTPYIFRAIEENATPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1. pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNVVRKTRPIALAT 

orfl0ng-l ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 
310 320 330 340 350 360 



370 380 390 400 410 420 

orf 10-1 . pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 

I I I I 1111:11 II: Ill 111111:1111 

orf 10ng-l LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 

370 380 390 400 410 420 

430 440 450 460 470 

orf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I : I ! I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 
orflOng-l CLASSAAYTCFGTPANYPLFAGWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and TV. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 45 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 381>: 

1 . . ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

4 01 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

4 51 AA.AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 



1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
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101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

4 01 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSE? DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKSEKK AAKEKVAPK? TPEQILNSGS IEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of N. 
meningitidis: 



10 20 30 

orf 65 .pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

1111:1 I I: I I I I I I 

orf 65a IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 
30 40 50 60 70 80 



40 50 60 70 80 90 

or f 65 . pep AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

: I I I II: I I II I II I I I I I 

orf 65a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 



100 110 120 130 140 150 

orf 65 . pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 



160 170 180 190 200 210 

orf 65 . pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

o r f 6 5 a KT PDKAEATHYLQMGAYADRRSAEGQRAKLAI LGIS SKWGYQAGHKT LYRVQSGNMS AD 

210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 
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TGAACCAGAG 
CCTGCAGAAA 
CCAACCTGAA 
AAGAGGCAGA 
GCCGACAAAG 
AAAGTCGGAC 
AACAAACCGT 
AAACAAGCGG 
AGAGAAAAAG 
AAATCCTCAA 
GAAGTGCAGA 
GCAAATGGGC 
AACTGGCAAT 
CATAAAACGC 
GAAAAAAATG 
GTTCTATCGA 



CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
CCGACGAGGT 
GGACAGGCAG 
CGGGGAAAAA 
T AAAAC CAT C 
GCGGAGAAGG 
CAGCGGCAGC 
AAATGAAAAC 
GCGTATGCCG 
CTTGGGCATA 
TTTACCGGGT 
CAGGACGAGT 
AAGCAAATAA 



GCGTTCAAAA 
GAAACCGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAGGAAAAG 
TGCGCAAGAA 
GCGCAGAAGA 
TAAAGAAACA 
AAAAAGTTGC 
ATCGAAAAAG 
GCCCGACAAG 
ACCGCCGGAG 
TCTTCCAAGG 
GCAAAGCGGC 
TGAAAAAACA 



TCCCGGTTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GCGGACGAGC 
AGCACTGACG 
AAGATGCCGA 
GAGAAAAAAG 
ACCCAAACCG 
CGCGCAGTGC 
GCGGAAGCAA 
CGCGGAAGGG 
TGGTCGGTTA 
AATATGTCTG 
TGAAGTCGCC 



GTCGAAGCAG 
AGGAAGACAT 
GATGCTGCGA 
GCAGCCCGTT 
CGGAGCGGGA 
GAAGAGCGTG 
AACGGTTAAA 
CTTCAAAAGA 
ACCCCGGAAC 
CGCTGCCAAA 
CGCATTATCT 
CAGCGTGCCA 
TCAGGCGGGA 
CCGATGCGGT 
AGCCTGATCC 



This encodes a protein having amino acid sequence <SEQ ID 386>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 



orf65a.pep 
orf65-l 



MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 

I I I I I I I I II II I = I I I I I I I I 

MFMNKFSOSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 



orf65a.pep NQPKEDIQPEPADQNALS2PDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I Mill: I 

orf65-l NQPKEDIQPE PADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I III 

orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 65a . pep TPEQILNSGS IEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILG I 

I I I I I I II I I I I I I Ill: II 

orf 65-1 TPEQILNSGS IEKARSAAAKEVQKMKTSDKAEATHYLQMGAYADRQSAEGQRAKLAILG I 

190 200 210 220 230 240 

250 260 270 280 290 

orf 65a. pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

I I I I I I I I I I I I I I I I I I I II I 

orf 65-1 SSKWGYQAGHKTLYRVQ3GNMSADAVKKMQDELKKHEVASLIRSIESKX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from N. 
gonorrhoeae: 



ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 

III : I I llllll: : I I 

ORF65 ILKPHNQLKEDIQPDPADQNALSEPDAATE 
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ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
ORF65 AEQSDAENAADKQPVADKADEVESKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 



ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 



ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 
ORF65 

ORF65ng MR 

20 ii 

ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
acid sequence <SEQ ID 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

25 51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGE PEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

30 After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 




ACAAATTTTC 
ATACTGGCAA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
ccgacgAGGT 
ggACAGGCAG 
cagggAAAAA 
tAaaaccgtc 
gcggcgaaag 
cagccgCagc 
AAatgaaaaa 
gcgcgtatgc 
aAtcttgGgc 
CGCTTTACCG 
ATGCAGGACG 
TGAAGGCAAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACTGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAag 
TGCGCAAGAA 
GCGCagaaga 
tAAAGAAACa 
aaaAAGttgc 
atcgaaaaag 
ctTtgggcaa 
cgaccgtccg 
atatctTccg 
CGTGCAAagc 
AGTTGAAAAA 
TAA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GcGGgcgAgc 
AGCACTGAcg 
AAGATGCCGA 
gagaaaaaag 

cgcgtagtgc 
ggcgGaagcc 
gagcgcggaA 
aagtggtcgG 
GGCAatatgt 
GCATGGGGtt 



CCGGTTTCTT 
TTGCTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGTTGCGA 
GCAGCCCGTT 
cggaACGGga 
gAAGAgcGTG 
AACGgTTAAA 
cTtcaaaaga 
accccggaaC 
cgctgccaaa 
aacgcattaT 
gggcagcgtg 
CTATCAGGCG 
ccgccgatgc 
gcCAGCCTGA 



This encodes the following amino acid sequence <SEQ ID 390>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGE PEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

251 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 



MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 
MFMNKFSQSGKGLSGFFFGLILATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 
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70 80 90 100 110 120 

orf 65-1. pep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

orf65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65-1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

orf65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
130 140 150 160 170 180 



190 200 210 220 230 239 

>rf 65-1 .pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 



240 250 260 270 280 290 

orf 65-1 .pep ISSKVVGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I 
orf65ng-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 
250 260 270 280 290 

On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 46 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
391>: 



1 ATGAACCACG ACATCACT"7T CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs.s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TAT GGAAAC T 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRML NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL+ 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 

orfl03a MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a GLILGLIGQVGVSLDQTRVXQMILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a NPILNRLLPIKSIPACLAVGILWGKLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 103a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

The complete length ORF103a nucleotide sequence <SEQ ID 395> is: 

1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGNAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 396>: 

1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 
51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 
101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 
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151 VYSASLYALG SGSAATGGLY M LAFALGTLP MLXAIGIF SL QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

orflC3a.pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

I I I I I I I I I Mill I I I I I I I I I I I 

orf 103-1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRV3SYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103a. pep GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

orf 103-1 GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103a. pep NPILNRLLP1KSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

Ml I I I I I I I I I I I I I I I! I I I I I I I I I II I I I I I I I I I 

orf 103-1 NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103a. pep NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I I I I I I I I I I I I I I M I I I I I ! I I I I I I I I I I I I I I I I I 
orf 103-1 NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 

Homology with a predicted ORF from N.sonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 
gonorrhoeae: 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I : I I I I I I 
orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

orf 103. pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orf 103 .pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 

orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

orf 103. pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTGCTCG GTTTCTTCGG 

51 CGGAACTCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATTCT GCTGCTTAAC 

151 ACAGGACGGA TAAGCAGCTA TACGGCAATC GGCCTGATGC TCGGATTAAT 

201 CGGACAACTC GGCATTTCAC TCGACCAAAc CCgcgTCCTG CAAAATATTT 

251 tatacacagc ctccaaCCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGCAACCTG AACCCGATAC TCAACCGGCT GCTGCCCATA AAATCCATAC 

401 CCGCCTGCCT TGCTGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CATCACTTTA CGCGCTGGGA AGCGGTAGTG CGACAACCGG 

501 CGGACTGTAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACAGGATT ATCCGTATCA TTATGGGCAT TATGGAAGCT 

651 TGCCGTCCTG TGGCTGTAA 



This encodes a protein having amino acid sequence <SEQ ID 398>: 



